From e982bc974d9c120a9c4694fa3d4c046986ef2c04 Mon Sep 17 00:00:00 2001
From: Khaled Salhab <khaled.salhab@develeap.com>
Date: Wed, 20 May 2026 22:20:04 +0300
Subject: [PATCH 1/5] chore: ignore .worktrees/ directory

---
 .gitignore | 1 +
 1 file changed, 1 insertion(+)

diff --git a/.gitignore b/.gitignore
index 49a726a..f406287 100644
--- a/.gitignore
+++ b/.gitignore
@@ -36,3 +36,4 @@ htmlcov/
 # Local dev tooling
 .claude/
 dist/
+.worktrees/

From fd86599b6d4d33cbfa23b01dc7852ecdb847cbc8 Mon Sep 17 00:00:00 2001
From: Khaled Salhab <khaled.salhab@develeap.com>
Date: Thu, 21 May 2026 02:18:12 +0300
Subject: [PATCH 2/5] docs(plan): MCP rate-limit fixes implementation plan
 (A1+A2+A3+A4+A5+B)

---
 docs/plans/2026-05-21-mcp-rate-limit-fixes.md | 1521 +++++++++++++++++
 1 file changed, 1521 insertions(+)
 create mode 100644 docs/plans/2026-05-21-mcp-rate-limit-fixes.md

diff --git a/docs/plans/2026-05-21-mcp-rate-limit-fixes.md b/docs/plans/2026-05-21-mcp-rate-limit-fixes.md
new file mode 100644
index 0000000..c17aa31
--- /dev/null
+++ b/docs/plans/2026-05-21-mcp-rate-limit-fixes.md
@@ -0,0 +1,1521 @@
+# MCP Rate-Limit Fixes Implementation Plan
+
+> **For agentic workers:** REQUIRED SUB-SKILL: Use trycycle-executing to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
+
+**Goal:** Make `HyperpingMcpClient` / `AsyncHyperpingMcpClient` survive Hyperping MCP server rate limits cleanly: detect JSON-RPC rate-limit errors as `HyperpingRateLimitError` with a parsed `retry_after`, stop wasting bucket slots after a rate-limit on `initialize`, close the TOCTOU race in lazy initialization, keep transient-retry semantics narrow, expose an explicit `ensure_initialized()` for startup health checks, and document the operational guidance users need to actually avoid the trap.
+
+**Architecture:** Two-layer change, applied symmetrically to the sync transport (`src/hyperping/_mcp_transport.py`) and async transport (`src/hyperping/_async_mcp_transport.py`). The transport gains (1) JSON-RPC-error classification before the generic `HyperpingAPIError` fallback, (2) a monotonic-clock "initialize cool-off" deadline that short-circuits `call_tool` while still under the server's `Retry-After`, and (3) a double-checked initialization pattern that holds the lock across the entire handshake. The high-level clients (`src/hyperping/mcp_client.py`, `src/hyperping/_async_mcp_client.py`) expose `ensure_initialized()` that delegates to the transport. README gains an "MCP rate limits" subsection. No public API breakage: existing 429 path, existing `HyperpingRateLimitError` shape, and all existing exception types stay backward compatible. The added `retry_after` value and the new `_init_blocked_until` deadline are the only new state.
+
+**Tech Stack:** Python 3.11+, `httpx` (sync `Client` + async `AsyncClient`), `pydantic` (only for `SecretStr` already in use), `threading.Lock` / `asyncio.Lock`, `time.monotonic()` (sync) and `asyncio.get_event_loop().time()` or `time.monotonic()` (async, monotonic is fine in both), `pytest` + `respx` for tests. No new runtime dependencies.
+
+---
+
+## Background, decisions, and what this plan is NOT doing
+
+### What the server actually does (verified from user reproductions and Hyperping docs)
+
+- Hyperping's MCP server is documented at `https://api.hyperping.io/v1/mcp` as "MCP over HTTP, stateless. No persistent connection." Rate limits documented: 300 req/min/key, shared with REST; REST docs add 800 req/hour/project with HTTP 429 + `Retry-After` + `ratelimit` header.
+- The server enforces an **undocumented per-verb cap on `initialize`** (observed ~5/min) returned as **HTTP 200 + JSON-RPC `error.code = -32000`** with the message literally containing `Hyperping MCP rate limit exceeded for "initialize" (N/5 per minute). Retry after Xs.`. It does **not** return HTTP 429 for this case. Today the SDK's 429 branch never fires for it; control falls through to the generic JSON-RPC error path at `_mcp_transport.py:114-120` / `_async_mcp_transport.py:113-119` which raises a plain `HyperpingAPIError`, so the caller cannot `except HyperpingRateLimitError`.
+- The MCP 2025-03-26 spec does NOT let a client skip `initialize`. The optional `Mcp-Session-Id` header binds *server-side* state to a connection; a fresh transport must always handshake. Hyperping's server is stateless, so there is no session id to capture or persist. **Therefore session-id reuse and on-disk session persistence are out of scope for this plan** (the user reviewed and dropped those ideas).
+
+### What is in scope (locked decisions)
+
+These five workstreams are landed in one cutover. They are independent only at the test level; the production code must ship together because A1 changes which exception type fires and A2/A4 rely on that classification.
+
+1. **A1. JSON-RPC rate-limit detection.** In `_send_rpc`, when the response is HTTP 200 with a JSON-RPC `error` whose `code == -32000` and whose `message` matches the rate-limit pattern, raise `HyperpingRateLimitError(retry_after=..., status_code=resp.status_code)` with the original message preserved and the raw server `error` dict in `response_body`. Parse `Retry after (\d+)s` and `(\d+)/(\d+) per minute` out of the message.
+2. **A2. Initialize cool-off latch.** When `HyperpingRateLimitError` is raised from inside `initialize()`, store `self._init_blocked_until = monotonic() + max(retry_after or 30, 1)`. Subsequent `call_tool` calls short-circuit before issuing any HTTP request: they raise `HyperpingRateLimitError` with a recomputed remaining `retry_after` (and a clear message identifying it as an `initialize` cool-off) until the deadline elapses; only then do we attempt `initialize` again. The latch is cleared by a successful initialize.
+3. **A3. TOCTOU init race fix.** Replace the leak-pattern (lock to read the flag, unlock, then call `initialize()`) with a double-checked pattern: acquire the lock, re-check `_initialized`, do the handshake while still holding the lock, set the flag, release. This works correctly with `asyncio.Lock` across `await`, and avoids the current "two coroutines/threads both see False, both POST `initialize`" bug.
+4. **A4. Narrow auto-retry, explicitly.** The existing `call_tool` retry loop only retries `HyperpingAPIError` with `status_code in {500, 502, 503, 504}`. After A1, a JSON-RPC rate-limit raises `HyperpingRateLimitError` whose `status_code` is 200 (the underlying HTTP code). Two regression tests pin this: rate-limit on `tools/call` is **not** retried by the transport, and the existing 5xx retry path still works. No production code change is strictly required for A4 beyond the tests, but the comment in `call_tool` is updated to say "transient HTTP server errors only; rate-limit errors are never retried at this layer."
+5. **A5. `ensure_initialized()` on the high-level clients.** A trivial public method that calls `self._transport.initialize()` if not yet initialized (respecting the cool-off latch). Lets services that want a startup health check do `mcp.ensure_initialized()` and catch `HyperpingRateLimitError` early, instead of discovering it on the first business call. Symmetric on the async client (`await mcp.ensure_initialized()`).
+6. **B. Documentation.** A new "MCP rate limits" subsection in `README.md` immediately after the existing "MCP Client" section. CHANGELOG entry under a new `[Unreleased]` heading describing user-visible behavior change.
+
+### What is explicitly out of scope (do not implement)
+
+- **`Mcp-Session-Id` capture/replay.** Server is stateless; no session id is returned and capturing one cannot let us skip `initialize`. The user has reviewed and excluded this.
+- **On-disk session persistence.** Same reason; also has a non-trivial security cost.
+- **Changing the MCP server's HTTP status code, cap value, or window semantics.** Server-side; tracked separately by the user as upstream feedback. Not this PR.
+- **Adding a global asyncio "initialize" semaphore across processes.** Cannot help on the documented limit; out of scope.
+- **Refactoring sync/async into a shared base class.** Tempting but introduces an architectural change beyond the rate-limit fix. The two files already mirror each other line-for-line; we maintain that parity by editing both. If a future change benefits from extraction, do it then.
+
+### Why a single cutover
+
+The user asked for "A1 + A2 + A3 + A4 + B + A5" as a single change. The pieces are mutually reinforcing: A1 without A2 still re-thrashes the initialize bucket; A2 without A1 has no typed error to latch on. A3 is a latent bug discovered during the audit and ships in the same PR because it is in the same lock and the same code path. A4 is one comment and two tests. A5 is six lines plus tests. B is documentation. The cutover is small, safe, and behind a typed exception that callers were already advised to catch.
+
+### Backward compatibility contract
+
+- `HyperpingRateLimitError` keeps its existing public attributes (`message`, `retry_after`, `status_code`, `response_body`, `request_id`). The `status_code` for the JSON-RPC rate-limit path is **200** (the HTTP layer), not 429. This is documented in the README and in the exception docstring. Existing callers using `except HyperpingRateLimitError as e: time.sleep(e.retry_after or 30)` are not regressed; they additionally gain coverage for the previously-silent 200/-32000 path.
+- Existing tests that mock `respx.Response(429, ...)` continue to pass unchanged.
+- `HyperpingMcpClient.__init__`, `close()`, `__enter__`, `__exit__`, and every existing tool method keep their signatures. `ensure_initialized()` is purely additive.
+- `McpTransport.__init__` signature is unchanged. The added `_init_blocked_until` is a private attribute.
+
+### Invariants the implementation MUST preserve
+
+1. **Single-flight `initialize` per transport instance.** After A3, no two callers can race a duplicate `initialize`. Verified by a concurrency test that fires N concurrent `call_tool` calls into a transport whose `respx` mock counts how many `initialize` requests it sees: must be exactly 1.
+2. **No silent rate-limit.** Any rate-limit signal from the server (HTTP 429 OR HTTP 200 + JSON-RPC -32000 with the documented message) MUST raise `HyperpingRateLimitError`. Verified by a JSON-RPC rate-limit test and the existing 429 test.
+3. **`Retry-After` is honored.** Parsed from the HTTP `Retry-After` header on 429, and from the JSON-RPC `error.message` on 200/-32000. Verified by parametrized tests covering integer, missing, malformed, and HTTP-date inputs.
+4. **No bucket thrashing while latched.** While `_init_blocked_until > monotonic()`, subsequent `call_tool` invocations issue zero HTTP requests. Verified by a `respx` test that counts `route.call_count == 0` after the latch trips.
+5. **Latch is cleared on successful re-init.** After the deadline elapses, the next `call_tool` triggers exactly one new `initialize`. Verified by advancing a monkeypatched monotonic clock and re-mocking the initialize response.
+6. **Transient HTTP retries do not catch rate-limit.** `call_tool` retry loop only retries the four documented 5xx codes; a `HyperpingRateLimitError` (whether 429 or 200) propagates immediately without re-invoking the request. Verified by a test that mocks a 429 and asserts the route was called exactly once.
+7. **Sync and async parity.** Every test and behavior added in the sync transport has a mirror in the async transport.
+
+---
+
+## File Structure
+
+The change touches a tight, focused set of files. No new files are created in production code. One new test module per transport for the rate-limit-and-init-cooloff behaviors, to keep diffs reviewable. Documentation and changelog updates round it out.
+
+**Production code (modify):**
+- `src/hyperping/_mcp_transport.py` — JSON-RPC rate-limit classification in `_send_rpc`; init cool-off latch state and short-circuit in `call_tool` / `initialize`; double-checked init in `call_tool` and `initialize`; comment on `call_tool` retry block; add `time` import already present.
+- `src/hyperping/_async_mcp_transport.py` — Mirror of the sync change with `asyncio.Lock` held across the awaitable initialize. Uses `time.monotonic()` for the cool-off deadline (process-wide, event-loop-independent).
+- `src/hyperping/mcp_client.py` — Add `ensure_initialized()` method.
+- `src/hyperping/_async_mcp_client.py` — Add `async def ensure_initialized()` method.
+
+**Tests (add new + extend):**
+- `tests/unit/test_mcp_transport.py` — Extend with: JSON-RPC -32000 rate-limit classification (4 cases: with Retry-After, without, malformed, with N/5 per minute); cool-off latch trips on initialize failure; latch short-circuits subsequent `call_tool` with zero HTTP requests; latch clears after monotonic deadline; concurrent first-call de-duplication (TOCTOU regression); `call_tool` does NOT retry rate-limit; comment-only test ensuring 5xx retry path is preserved.
+- `tests/unit/test_async_mcp_transport.py` — Mirror of the above for async, including a concurrent-coroutines test using `asyncio.gather`.
+- `tests/unit/test_mcp_client.py` — `ensure_initialized()` calls through to transport and propagates `HyperpingRateLimitError`.
+- `tests/unit/test_async_mcp_client.py` — Async mirror.
+
+**Documentation (modify):**
+- `README.md` — New "MCP rate limits" subsection immediately after the existing "MCP Client" subsection (around line 190). Mentions: long-lived client per process, undocumented `initialize` cap, the typed exception with `.retry_after`, `ensure_initialized()` for startup checks.
+- `CHANGELOG.md` — New `[Unreleased]` heading at the top with `### Added` and `### Fixed` entries.
+
+**Files NOT modified (informative):**
+- `src/hyperping/exceptions.py` — `HyperpingRateLimitError` is already shaped correctly. No change needed.
+- `src/hyperping/__init__.py` — `HyperpingRateLimitError` and the MCP clients are already exported.
+- `src/hyperping/_version.py` — Version bump and CHANGELOG release-line are reserved for a separate release PR; this plan adds an `[Unreleased]` heading instead, matching the project's prior pattern.
+
+---
+
+## Strategy Gate (decision record)
+
+**Is this the right problem to solve?** Yes. The user's bug report is real, reproducible, and not addressable from the SDK by avoiding `initialize` (server is stateless; spec mandates the handshake). The pragmatic value is in (a) detecting the rate-limit signal correctly, (b) not making it worse by thrashing, and (c) documenting the constraint. Everything else requires server changes the user is filing separately.
+
+**Is the proposed architecture right?** Yes, after eliminating alternatives:
+- *Alternative: shared session id across processes.* Server is documented stateless; no session id is issued. Even if it were, the MCP spec does not let a client skip `initialize` for a fresh transport. Rejected.
+- *Alternative: process-wide singleton `HyperpingMcpClient`.* Would help in one process, doesn't help across processes (cron + watchdog + dev CLI). We document the guidance instead of trying to enforce singletons (which would surprise users).
+- *Alternative: silently swallow the first rate-limit and retry with backoff inside `call_tool`.* Surfaces the same symptom (long mysterious blocking) without the typed exception users need to make scheduling decisions. Rejected.
+- *Alternative: ship a refactor that unifies sync/async transports.* Out of scope; revisit after this lands.
+- *Alternative: detect the rate-limit message only by substring `"rate limit exceeded"` without checking JSON-RPC code.* Fragile against future message changes. We check **both** `code == -32000` AND a regex match on `message`, then prefer the documented format but also accept the substring fallback. Rationale included in the code comment.
+
+**Are there assumptions baked in?** Two, both documented:
+1. The JSON-RPC error code for the rate-limit case is `-32000` (server-defined error range). If the server changes this, the substring check on `message` is the safety net. We do NOT match purely on substring without the code check, because `-32000` is the documented "server-defined" range and matching by message alone would also catch e.g. localized future translations or unrelated errors with the word "rate". The code-plus-message double check is intentional.
+2. `time.monotonic()` is the right clock for the cool-off latch in both sync and async. It is, because we never compare across processes or persist it.
+
+**Ready for task decomposition.** Yes. Architectural direction is stable.
+
+---
+
+## Implementation notes the executor must read
+
+### Regex for parsing the rate-limit message
+
+A single compiled regex covers the observed message format. Define it once at module scope in both transports:
+
+```python
+import re
+
+_MCP_RATE_LIMIT_MARKER = "rate limit"
+_MCP_RATE_LIMIT_RETRY_AFTER_RE = re.compile(r"[Rr]etry after\s+(\d+)\s*s")
+```
+
+The check is:
+
+```python
+if (
+    isinstance(err, dict)
+    and err.get("code") == -32000
+    and isinstance(err.get("message"), str)
+    and _MCP_RATE_LIMIT_MARKER in err["message"].lower()
+):
+    retry_after: int | None = None
+    m = _MCP_RATE_LIMIT_RETRY_AFTER_RE.search(err["message"])
+    if m:
+        try:
+            retry_after = int(m.group(1))
+        except ValueError:  # defensive; regex guarantees digits
+            retry_after = None
+    raise HyperpingRateLimitError(
+        err["message"],
+        retry_after=retry_after,
+        status_code=resp.status_code,
+        response_body=err,
+    )
+```
+
+Placement: inside `_send_rpc`, immediately after `data = resp.json()`, before the existing generic `if "error" in data:` block. That keeps the generic JSON-RPC error path intact for every other `-32xxx` code.
+
+### Cool-off latch contract
+
+Add to `__init__`:
+
+```python
+self._init_blocked_until: float = 0.0  # monotonic seconds; 0.0 means "no cool-off"
+```
+
+In `initialize()`, on success after `self._send_rpc("notifications/initialized", is_notification=True)`, set `self._init_blocked_until = 0.0` (clears any prior latch). Wrap the actual handshake calls in try/except so a `HyperpingRateLimitError` arms the latch:
+
+```python
+try:
+    result = self._send_rpc(
+        "initialize",
+        {"protocolVersion": _PROTOCOL_VERSION, "capabilities": {},
+         "clientInfo": {"name": "hyperping-python", "version": __version__}},
+    )
+    self._send_rpc("notifications/initialized", is_notification=True)
+except HyperpingRateLimitError as exc:
+    wait = exc.retry_after if exc.retry_after and exc.retry_after > 0 else 30
+    self._init_blocked_until = time.monotonic() + wait
+    raise
+```
+
+In `call_tool()`, before any HTTP work, after acquiring the double-checked init lock, check the latch:
+
+```python
+remaining = self._init_blocked_until - time.monotonic()
+if remaining > 0:
+    raise HyperpingRateLimitError(
+        "MCP initialize rate limit cool-off active; retry later",
+        retry_after=int(remaining) + 1,
+        status_code=200,
+    )
+```
+
+Use `int(remaining) + 1` so the advertised value is always >= 1 second and never reports `0` while still blocked. Place this check **after** the double-checked init lock acquisition, **before** the loop. Rationale: a latched transport whose handshake already failed must not silently issue a `tools/call` request; we treat it as "still rate-limited."
+
+### Double-checked init pattern (sync)
+
+Replace the existing TOCTOU pattern in `call_tool`:
+
+```python
+# Before (BUG):
+with self._lock:
+    needs_init = not self._initialized
+if needs_init:
+    self.initialize()
+
+# After (FIX):
+self._ensure_initialized_locked()
+```
+
+Where `_ensure_initialized_locked` is a small private helper on the transport:
+
+```python
+def _ensure_initialized_locked(self) -> None:
+    if self._initialized:
+        return
+    with self._lock:
+        if self._initialized:
+            return
+        # Latch check while holding lock; raises if cool-off active.
+        remaining = self._init_blocked_until - time.monotonic()
+        if remaining > 0:
+            raise HyperpingRateLimitError(
+                "MCP initialize rate limit cool-off active; retry later",
+                retry_after=int(remaining) + 1,
+                status_code=200,
+            )
+        self._initialize_locked()  # performs handshake, sets _initialized
+```
+
+Splitting `initialize()` into a thin public method that takes the lock and a `_initialize_locked()` that assumes the lock is held keeps the public method idempotent and re-entrant for `ensure_initialized()`.
+
+```python
+def initialize(self) -> dict[str, Any]:
+    with self._lock:
+        if self._initialized:
+            # Idempotent: return the cached server result if we still have it,
+            # else an empty dict (the only reason to re-call is informational).
+            return self._init_result
+        return self._initialize_locked()
+
+def _initialize_locked(self) -> dict[str, Any]:
+    # Assumes self._lock is held.
+    try:
+        result = self._send_rpc(...)
+        self._send_rpc("notifications/initialized", is_notification=True)
+    except HyperpingRateLimitError as exc:
+        wait = exc.retry_after if exc.retry_after and exc.retry_after > 0 else 30
+        self._init_blocked_until = time.monotonic() + wait
+        raise
+    self._initialized = True
+    self._init_blocked_until = 0.0
+    self._init_result = result.get("result", {}) if result else {}
+    return self._init_result
+```
+
+Add `self._init_result: dict[str, Any] = {}` in `__init__`.
+
+### Double-checked init pattern (async)
+
+Mirror exactly with `async with self._lock:` held across the awaitable handshake. `asyncio.Lock` is safe across `await`. Use the same `time.monotonic()` clock.
+
+```python
+async def _ensure_initialized_locked(self) -> None:
+    if self._initialized:
+        return
+    async with self._lock:
+        if self._initialized:
+            return
+        remaining = self._init_blocked_until - time.monotonic()
+        if remaining > 0:
+            raise HyperpingRateLimitError(
+                "MCP initialize rate limit cool-off active; retry later",
+                retry_after=int(remaining) + 1,
+                status_code=200,
+            )
+        await self._initialize_locked()
+```
+
+Also import `time` at the top of `_async_mcp_transport.py` (currently not imported).
+
+### Interaction with `_next_id` lock
+
+`_next_id` already takes `self._lock` for the request-id counter. The new `_ensure_initialized_locked` also acquires it. There is no deadlock because:
+- `_ensure_initialized_locked` runs **before** any `_send_rpc` call in `call_tool`, so the lock is released before `_send_rpc` is invoked. `_send_rpc` re-acquires the lock briefly inside `_next_id`. No nested acquisition.
+- `_initialize_locked` calls `_send_rpc`, which calls `_next_id`, which re-acquires `self._lock`. **This is a re-entrancy issue with `threading.Lock` (non-reentrant).**
+
+**Mitigation:** Use `threading.RLock` instead of `threading.Lock` in the sync transport. `asyncio.Lock` is not reentrant either, so for the async transport, restructure `_initialize_locked` to NOT call `_next_id` under the lock — instead, increment the counter directly inside `_initialize_locked` while the lock is already held, by inlining the counter logic. Concretely, in async:
+
+```python
+async def _initialize_locked(self) -> dict[str, Any]:
+    # self._lock is held by caller. Increment the request id directly
+    # rather than awaiting _next_id which would re-acquire the lock.
+    self._request_id += 1
+    init_id = self._request_id
+    # Build and POST the initialize payload inline...
+```
+
+This is ugly but correct. The simpler alternative is to use a separate lock for the request id and a separate lock for initialization state. **Adopt that:** add `self._init_lock` (`threading.Lock` / `asyncio.Lock`) for initialization, and keep the existing `self._lock` solely for the request-id counter. This keeps `_send_rpc` callable from within the init critical section without re-entrancy.
+
+**Final lock structure (adopted):**
+- `self._id_lock` — protects `self._request_id` counter only. Renamed from `_lock` if convenient; the executor should rename for clarity but keep the rename localized. To minimize diff, the executor MAY keep `self._lock` for the id counter and introduce `self._init_lock` for initialization. This is the recommended minimal-diff path.
+
+The plan locks in: **two locks, one for the id counter (existing `self._lock`), one for initialization (`self._init_lock`).** Use `threading.Lock` and `asyncio.Lock` respectively. No RLock; we prefer non-reentrant locks for clarity.
+
+### `ensure_initialized()` on high-level clients
+
+Trivial delegation. On sync `HyperpingMcpClient`:
+
+```python
+def ensure_initialized(self) -> None:
+    """Perform the MCP handshake now if it hasn't happened yet.
+
+    Useful for startup health checks: call this once on boot and catch
+    :class:`HyperpingRateLimitError` so you can decide whether to start
+    the rest of your service. Subsequent tool calls reuse the handshake.
+
+    Raises:
+        HyperpingRateLimitError: If the server rate-limits ``initialize``,
+            either via HTTP 429 or via the JSON-RPC ``-32000`` rate-limit
+            payload. Inspect ``.retry_after`` to back off.
+        HyperpingAuthError: If the API key is invalid.
+    """
+    self._transport.initialize()
+```
+
+On async `AsyncHyperpingMcpClient`:
+
+```python
+async def ensure_initialized(self) -> None:
+    """Async counterpart to :meth:`HyperpingMcpClient.ensure_initialized`."""
+    await self._transport.initialize()
+```
+
+Both rely on `initialize()` being idempotent (no-op if already initialized) after the A3 refactor. Tests cover idempotency and rate-limit propagation.
+
+### Documentation copy (final)
+
+The new README subsection text is fixed in this plan (do not paraphrase):
+
+````markdown
+### MCP rate limits and connection lifecycle
+
+The Hyperping MCP server (`https://api.hyperping.io/v1/mcp`) is stateless over HTTP
+and rate-limits per API key. The publicly documented limit is 300 requests per
+minute shared with the REST API, but the server also enforces a separate, low cap
+on the `initialize` handshake (observed around 5/minute). Because every new
+`HyperpingMcpClient` instance must perform the MCP `initialize` handshake on its
+first call, instantiating the client in a hot path or running several short-lived
+processes against one key will trip this cap.
+
+Operational guidance:
+
+- **Create one `HyperpingMcpClient` per process and reuse it.** Do not instantiate
+  it inside a loop. The first call performs the handshake; subsequent calls reuse
+  it for the life of the client.
+- **Catch `HyperpingRateLimitError` and honour `retry_after`.** Rate-limit signals
+  arrive two ways: as HTTP 429 (with a standard `Retry-After` header) and as a
+  JSON-RPC server error (`code: -32000`, HTTP 200) on `initialize`. Both surface as
+  `HyperpingRateLimitError` with `retry_after` parsed from whichever signal was
+  used. The `status_code` attribute is `429` or `200` respectively.
+- **Use `ensure_initialized()` for startup health checks.** Calling it once on
+  service boot lets you fail fast if the key is already at the `initialize` cap,
+  instead of failing on the first business call.
+- **Several workloads on one key collide on the `initialize` cap.** A weekly cron,
+  a watchdog daemon, and a developer running the CLI cannot all warm up the same
+  API key inside one minute. Use one long-lived process per workload, or separate
+  API keys per workload if your plan allows.
+- **After a rate-limit on `initialize`, the SDK latches a cool-off** so that
+  subsequent `call_tool` invocations on the same client fail fast with
+  `HyperpingRateLimitError` (no extra HTTP traffic) until `retry_after` elapses.
+  This prevents accidentally burning more slots from the bucket.
+
+```python
+from hyperping import HyperpingMcpClient, HyperpingRateLimitError
+
+mcp = HyperpingMcpClient(api_key="sk_...")
+try:
+    mcp.ensure_initialized()
+except HyperpingRateLimitError as e:
+    print(f"MCP cold-start rate-limited; retry in {e.retry_after}s")
+    raise
+
+summary = mcp.get_status_summary()
+```
+````
+
+The CHANGELOG entry text (final):
+
+```markdown
+## [Unreleased]
+
+### Added
+
+- `ensure_initialized()` on `HyperpingMcpClient` and `AsyncHyperpingMcpClient` for
+  startup health checks. Performs the MCP handshake now if it hasn't happened yet
+  and raises `HyperpingRateLimitError` if the server's `initialize` cap is hit.
+- New "MCP rate limits and connection lifecycle" section in README documenting
+  Hyperping's stateless MCP server, the undocumented `initialize` cap, and the
+  recommended client lifetime per process.
+
+### Fixed
+
+- MCP rate-limit errors that the server returns as HTTP 200 with JSON-RPC
+  `error.code = -32000` (notably the `initialize` per-minute cap) are now
+  classified as `HyperpingRateLimitError` with `retry_after` parsed from the
+  message, instead of a generic `HyperpingAPIError`. Existing HTTP 429 handling is
+  unchanged.
+- After a rate-limit on `initialize`, the MCP transport latches a cool-off so
+  subsequent `call_tool` invocations short-circuit with `HyperpingRateLimitError`
+  until the advertised `retry_after` elapses, instead of issuing further HTTP
+  requests that would burn more slots from the bucket.
+- TOCTOU race in lazy `initialize` where two concurrent first calls on the same
+  `HyperpingMcpClient` could each POST `initialize`. The handshake is now
+  performed under a dedicated lock with a double-checked flag.
+```
+
+---
+
+## Completion Standard
+
+Done means:
+
+- The full project test suite (`pytest`) passes, including the new tests, with `--cov-fail-under=85` still green.
+- `ruff check src tests` is clean.
+- `mypy --strict src` is clean (the codebase is configured for strict mypy via `pyproject.toml`).
+- No public symbol has been removed or renamed. `ensure_initialized` is the only addition. `HyperpingRateLimitError` shape is unchanged.
+- The README and CHANGELOG changes render correctly (manual visual is fine; no docs site).
+- The user's exact paste-ready repro snippet (six fresh-client iterations under rate-limit) now raises `HyperpingRateLimitError` on the first hit and, on a sixth iteration that creates yet another client, raises the same typed exception immediately (because the latch is per-transport-instance, the new client will hit the server cap on its own `initialize`, raise typed, and latch its own instance). The behavior improvement is: each instance fails fast and predictably with `.retry_after`. The user can build retry logic around that.
+
+---
+
+## Task Breakdown
+
+Each task is a small, committable unit. Tasks are ordered for safe incremental commits. Run the full project suite after each commit; do not weaken tests.
+
+### Task 1: Add lock-separation and shared regex constants
+
+**Files:**
+- Modify: `src/hyperping/_mcp_transport.py:23-55`
+- Modify: `src/hyperping/_async_mcp_transport.py:22-54`
+
+- [ ] **Step 1: Identify or write the failing test**
+
+This task is a structural refactor that does not change observable behavior. The "failing test" is the existing transport suite which must continue to pass after the lock split.
+
+```bash
+pytest tests/unit/test_mcp_transport.py tests/unit/test_async_mcp_transport.py -v
+```
+
+Expected before refactor: all PASS (no behavior change yet). This is the regression baseline.
+
+- [ ] **Step 2: Run the suite to confirm baseline green**
+
+Run: `pytest tests/unit/test_mcp_transport.py tests/unit/test_async_mcp_transport.py -v`
+Expected: all PASS.
+
+- [ ] **Step 3: Apply the refactor**
+
+In `src/hyperping/_mcp_transport.py`:
+- Add the module-level constants after `_PROTOCOL_VERSION`:
+
+```python
+import re
+
+_MCP_RATE_LIMIT_MARKER = "rate limit"
+_MCP_RATE_LIMIT_RETRY_AFTER_RE = re.compile(r"[Rr]etry after\s+(\d+)\s*s")
+```
+
+- In `McpTransport.__init__`, after `self._lock = threading.Lock()`, add:
+
+```python
+self._init_lock = threading.Lock()
+self._init_blocked_until: float = 0.0
+self._init_result: dict[str, Any] = {}
+```
+
+In `src/hyperping/_async_mcp_transport.py`:
+- Add `import re` and `import time` at the top alongside the existing imports.
+- Add the same two module-level constants after `_PROTOCOL_VERSION`.
+- In `AsyncMcpTransport.__init__`, after `self._lock = asyncio.Lock()`, add:
+
+```python
+self._init_lock = asyncio.Lock()
+self._init_blocked_until: float = 0.0
+self._init_result: dict[str, Any] = {}
+```
+
+- [ ] **Step 4: Run the suite to confirm no regression**
+
+Run: `pytest -q`
+Expected: all PASS, coverage >= 85%.
+
+- [ ] **Step 5: Refactor and verify**
+
+Inspect both files for symmetry: identical constants, identical attribute names. Re-run `pytest -q` and `ruff check src tests`. Expected: clean.
+
+- [ ] **Step 6: Commit**
+
+```bash
+git add src/hyperping/_mcp_transport.py src/hyperping/_async_mcp_transport.py
+git commit -m "refactor(mcp): split init lock from request-id lock, add cool-off scaffolding"
+```
+
+---
+
+### Task 2: JSON-RPC rate-limit classification in sync `_send_rpc` (A1)
+
+**Files:**
+- Modify: `src/hyperping/_mcp_transport.py:113-121`
+- Test: `tests/unit/test_mcp_transport.py`
+
+- [ ] **Step 1: Write failing tests**
+
+Add these tests at the bottom of `tests/unit/test_mcp_transport.py`:
+
+```python
+@respx.mock
+def test_jsonrpc_rate_limit_classified_as_rate_limit_error():
+    """200 + JSON-RPC code=-32000 with rate-limit message -> HyperpingRateLimitError."""
+    respx.post(MCP_URL).mock(
+        return_value=httpx.Response(
+            200,
+            json={
+                "jsonrpc": "2.0",
+                "id": 1,
+                "error": {
+                    "code": -32000,
+                    "message": 'Hyperping MCP rate limit exceeded for "initialize" '
+                               "(5/5 per minute). Retry after 32s.",
+                },
+            },
+        )
+    )
+    transport = McpTransport(api_key="sk_test", base_url=MCP_URL)
+    transport._initialized = True  # bypass handshake to exercise _send_rpc directly
+    with pytest.raises(HyperpingRateLimitError) as exc_info:
+        transport.call_tool("some_tool")
+    assert exc_info.value.retry_after == 32
+    assert exc_info.value.status_code == 200
+    assert "rate limit" in exc_info.value.message.lower()
+    assert exc_info.value.response_body["code"] == -32000
+    transport.close()
+
+
+@respx.mock
+def test_jsonrpc_rate_limit_without_retry_after_seconds():
+    """JSON-RPC rate-limit message without parseable seconds -> retry_after=None."""
+    respx.post(MCP_URL).mock(
+        return_value=httpx.Response(
+            200,
+            json={
+                "jsonrpc": "2.0",
+                "id": 1,
+                "error": {
+                    "code": -32000,
+                    "message": "Hyperping MCP rate limit exceeded. Try again later.",
+                },
+            },
+        )
+    )
+    transport = McpTransport(api_key="sk_test", base_url=MCP_URL)
+    transport._initialized = True
+    with pytest.raises(HyperpingRateLimitError) as exc_info:
+        transport.call_tool("some_tool")
+    assert exc_info.value.retry_after is None
+    transport.close()
+
+
+@respx.mock
+def test_jsonrpc_non_ratelimit_error_still_generic_api_error():
+    """Non -32000 JSON-RPC error continues to raise plain HyperpingAPIError."""
+    respx.post(MCP_URL).mock(
+        return_value=httpx.Response(
+            200,
+            json={
+                "jsonrpc": "2.0",
+                "id": 1,
+                "error": {"code": -32601, "message": "Method not found"},
+            },
+        )
+    )
+    transport = McpTransport(api_key="sk_test", base_url=MCP_URL)
+    transport._initialized = True
+    with pytest.raises(HyperpingAPIError) as exc_info:
+        transport.call_tool("nonexistent_tool")
+    assert not isinstance(exc_info.value, HyperpingRateLimitError)
+    transport.close()
+
+
+@respx.mock
+def test_jsonrpc_32000_but_not_ratelimit_message_is_generic():
+    """code=-32000 without 'rate limit' substring stays a generic HyperpingAPIError."""
+    respx.post(MCP_URL).mock(
+        return_value=httpx.Response(
+            200,
+            json={
+                "jsonrpc": "2.0",
+                "id": 1,
+                "error": {"code": -32000, "message": "Some other server error"},
+            },
+        )
+    )
+    transport = McpTransport(api_key="sk_test", base_url=MCP_URL)
+    transport._initialized = True
+    with pytest.raises(HyperpingAPIError) as exc_info:
+        transport.call_tool("some_tool")
+    assert not isinstance(exc_info.value, HyperpingRateLimitError)
+    transport.close()
+```
+
+- [ ] **Step 2: Run tests to verify they fail**
+
+Run: `pytest tests/unit/test_mcp_transport.py -v -k "jsonrpc_rate_limit or jsonrpc_non_ratelimit or jsonrpc_32000"`
+Expected: 2-3 FAIL (the rate-limit ones; the negative test that expects a generic error may already pass). Confirm the failing tests fail with `HyperpingAPIError` raised instead of `HyperpingRateLimitError`.
+
+- [ ] **Step 3: Implement the classification in `_send_rpc`**
+
+In `src/hyperping/_mcp_transport.py`, modify `_send_rpc` so the JSON-RPC error handling section becomes:
+
+```python
+data = resp.json()
+if "error" in data:
+    err = data["error"]
+    if (
+        isinstance(err, dict)
+        and err.get("code") == -32000
+        and isinstance(err.get("message"), str)
+        and _MCP_RATE_LIMIT_MARKER in err["message"].lower()
+    ):
+        retry_after: int | None = None
+        match = _MCP_RATE_LIMIT_RETRY_AFTER_RE.search(err["message"])
+        if match:
+            try:
+                retry_after = int(match.group(1))
+            except ValueError:
+                retry_after = None
+        raise HyperpingRateLimitError(
+            err["message"],
+            retry_after=retry_after,
+            status_code=resp.status_code,
+            response_body=err,
+        )
+    raise HyperpingAPIError(
+        f"MCP error {err.get('code', '?')}: {err.get('message', 'unknown')}",
+        status_code=resp.status_code,
+        response_body=err,
+    )
+return data  # type: ignore[no-any-return]
+```
+
+- [ ] **Step 4: Run tests to verify they pass**
+
+Run: `pytest tests/unit/test_mcp_transport.py -v`
+Expected: all PASS, including the four new tests.
+
+- [ ] **Step 5: Refactor and verify**
+
+Inspect: the `_send_rpc` change is the only edit to the sync transport in this task. Re-run `pytest -q` and `ruff check src tests`. Expected: clean.
+
+- [ ] **Step 6: Commit**
+
+```bash
+git add src/hyperping/_mcp_transport.py tests/unit/test_mcp_transport.py
+git commit -m "feat(mcp): classify JSON-RPC -32000 rate-limit as HyperpingRateLimitError (sync)"
+```
+
+---
+
+### Task 3: JSON-RPC rate-limit classification in async `_send_rpc` (A1 mirror)
+
+**Files:**
+- Modify: `src/hyperping/_async_mcp_transport.py:112-120`
+- Test: `tests/unit/test_async_mcp_transport.py`
+
+- [ ] **Step 1: Write failing tests**
+
+Add the async mirrors of the four tests from Task 2 at the bottom of `tests/unit/test_async_mcp_transport.py`. Use `async def` and `await transport.call_tool(...)`. Same response bodies; same assertions on `exc_info.value.retry_after`, `.status_code`, `.message`, `.response_body`.
+
+```python
+@respx.mock
+async def test_jsonrpc_rate_limit_classified_as_rate_limit_error():
+    respx.post(MCP_URL).mock(
+        return_value=httpx.Response(
+            200,
+            json={
+                "jsonrpc": "2.0",
+                "id": 1,
+                "error": {
+                    "code": -32000,
+                    "message": 'Hyperping MCP rate limit exceeded for "initialize" '
+                               "(5/5 per minute). Retry after 32s.",
+                },
+            },
+        )
+    )
+    transport = AsyncMcpTransport(api_key="sk_test", base_url=MCP_URL)
+    transport._initialized = True
+    with pytest.raises(HyperpingRateLimitError) as exc_info:
+        await transport.call_tool("some_tool")
+    assert exc_info.value.retry_after == 32
+    assert exc_info.value.status_code == 200
+    await transport.close()
+```
+
+Repeat for the three other cases. Mirror exactly.
+
+- [ ] **Step 2: Run tests to verify they fail**
+
+Run: `pytest tests/unit/test_async_mcp_transport.py -v -k "jsonrpc"`
+Expected: FAIL.
+
+- [ ] **Step 3: Implement in async `_send_rpc`**
+
+Apply the same edit as Task 2 to `src/hyperping/_async_mcp_transport.py`, in the JSON-RPC error block (lines 112-120). Identical logic. No `await`s introduced inside the new branch.
+
+- [ ] **Step 4: Run tests to verify they pass**
+
+Run: `pytest tests/unit/test_async_mcp_transport.py -v`
+Expected: all PASS.
+
+- [ ] **Step 5: Refactor and verify**
+
+Compare the sync and async `_send_rpc` JSON-RPC blocks line by line — they must be structurally identical. Run: `pytest -q && ruff check src tests`. Expected: clean.
+
+- [ ] **Step 6: Commit**
+
+```bash
+git add src/hyperping/_async_mcp_transport.py tests/unit/test_async_mcp_transport.py
+git commit -m "feat(mcp): classify JSON-RPC -32000 rate-limit as HyperpingRateLimitError (async)"
+```
+
+---
+
+### Task 4: Sync TOCTOU init fix and idempotent `initialize()` (A3)
+
+**Files:**
+- Modify: `src/hyperping/_mcp_transport.py` (`initialize`, `call_tool`)
+- Test: `tests/unit/test_mcp_transport.py`
+
+- [ ] **Step 1: Write failing tests**
+
+Add to `tests/unit/test_mcp_transport.py`:
+
+```python
+import threading
+
+
+@respx.mock
+def test_concurrent_first_call_single_initialize():
+    """Two concurrent first-callers must trigger exactly one initialize POST."""
+    # respx counts calls per route; one mock for initialize, one for notification,
+    # one tool reply that can be served twice.
+    init_route = respx.post(MCP_URL).mock(
+        side_effect=[
+            httpx.Response(
+                200,
+                json={"jsonrpc": "2.0", "id": 1, "result": {"protocolVersion": "2025-03-26"}},
+            ),
+            httpx.Response(202),  # notifications/initialized
+            httpx.Response(
+                200,
+                json={
+                    "jsonrpc": "2.0",
+                    "id": 2,
+                    "result": {"content": [{"type": "text", "text": json.dumps({"ok": True})}]},
+                },
+            ),
+            httpx.Response(
+                200,
+                json={
+                    "jsonrpc": "2.0",
+                    "id": 3,
+                    "result": {"content": [{"type": "text", "text": json.dumps({"ok": True})}]},
+                },
+            ),
+        ]
+    )
+    transport = McpTransport(api_key="sk_test", base_url=MCP_URL)
+    barrier = threading.Barrier(2)
+    results = []
+
+    def hit():
+        barrier.wait()
+        results.append(transport.call_tool("some_tool"))
+
+    t1 = threading.Thread(target=hit)
+    t2 = threading.Thread(target=hit)
+    t1.start(); t2.start()
+    t1.join(); t2.join()
+
+    # Expected: exactly 4 POSTs total — one initialize, one notification, two tool calls.
+    assert init_route.call_count == 4
+    assert results == [{"ok": True}, {"ok": True}]
+    transport.close()
+
+
+@respx.mock
+def test_initialize_is_idempotent():
+    """Calling initialize() twice does not POST twice."""
+    route = respx.post(MCP_URL).mock(
+        side_effect=[
+            httpx.Response(
+                200,
+                json={"jsonrpc": "2.0", "id": 1, "result": {"protocolVersion": "2025-03-26"}},
+            ),
+            httpx.Response(202),
+        ]
+    )
+    transport = McpTransport(api_key="sk_test", base_url=MCP_URL)
+    transport.initialize()
+    transport.initialize()  # second call must be a no-op
+    assert route.call_count == 2  # initialize + notification, not 4
+    transport.close()
+```
+
+- [ ] **Step 2: Run tests to verify they fail**
+
+Run: `pytest tests/unit/test_mcp_transport.py -v -k "concurrent_first_call or initialize_is_idempotent"`
+Expected: at minimum `test_initialize_is_idempotent` FAILs (current code POSTs twice). The concurrent test may be flaky on current code; document the observed failure mode.
+
+- [ ] **Step 3: Implement the double-checked init**
+
+Refactor `initialize()` and `call_tool()` in `src/hyperping/_mcp_transport.py`:
+
+```python
+def initialize(self) -> dict[str, Any]:
+    """Perform MCP handshake if not yet performed. Idempotent and thread-safe."""
+    with self._init_lock:
+        if self._initialized:
+            return self._init_result
+        return self._initialize_locked()
+
+def _initialize_locked(self) -> dict[str, Any]:
+    """Perform the handshake. Assumes self._init_lock is held."""
+    remaining = self._init_blocked_until - time.monotonic()
+    if remaining > 0:
+        raise HyperpingRateLimitError(
+            "MCP initialize rate limit cool-off active; retry later",
+            retry_after=int(remaining) + 1,
+            status_code=200,
+        )
+    try:
+        result = self._send_rpc(
+            "initialize",
+            {
+                "protocolVersion": _PROTOCOL_VERSION,
+                "capabilities": {},
+                "clientInfo": {"name": "hyperping-python", "version": __version__},
+            },
+        )
+        self._send_rpc("notifications/initialized", is_notification=True)
+    except HyperpingRateLimitError as exc:
+        wait = exc.retry_after if exc.retry_after and exc.retry_after > 0 else 30
+        self._init_blocked_until = time.monotonic() + wait
+        raise
+    self._init_result = result.get("result", {}) if result else {}
+    self._initialized = True
+    self._init_blocked_until = 0.0
+    return self._init_result
+```
+
+In `call_tool`, replace the TOCTOU block (lines 151-154) with:
+
+```python
+self.initialize()
+```
+
+That single call is now safe to invoke unconditionally because `initialize()` is idempotent and acquires `_init_lock` internally. The fast path (already initialized) does a single lock acquire and an attribute read — measurably cheap.
+
+- [ ] **Step 4: Run tests to verify they pass**
+
+Run: `pytest tests/unit/test_mcp_transport.py -v`
+Expected: all PASS, including the new two.
+
+- [ ] **Step 5: Refactor and verify**
+
+Run the full suite: `pytest -q`. Verify coverage hasn't dropped below 85%. Run `ruff check src tests` and `mypy --strict src`. Expected: clean.
+
+- [ ] **Step 6: Commit**
+
+```bash
+git add src/hyperping/_mcp_transport.py tests/unit/test_mcp_transport.py
+git commit -m "fix(mcp): close TOCTOU race in lazy initialize; make initialize() idempotent (sync)"
+```
+
+---
+
+### Task 5: Async TOCTOU init fix and idempotent `initialize()` (A3 mirror)
+
+**Files:**
+- Modify: `src/hyperping/_async_mcp_transport.py`
+- Test: `tests/unit/test_async_mcp_transport.py`
+
+- [ ] **Step 1: Write failing tests**
+
+Add to `tests/unit/test_async_mcp_transport.py`:
+
+```python
+import asyncio
+
+
+@respx.mock
+async def test_concurrent_first_call_single_initialize():
+    respx.post(MCP_URL).mock(
+        side_effect=[
+            INIT_RESPONSE,
+            NOTIFICATION_ACCEPTED,
+            _tool_response({"ok": True}, req_id=2),
+            _tool_response({"ok": True}, req_id=3),
+        ],
+    )
+    transport = AsyncMcpTransport(api_key="sk_test", base_url=MCP_URL)
+    r1, r2 = await asyncio.gather(
+        transport.call_tool("some_tool"),
+        transport.call_tool("some_tool"),
+    )
+    assert r1 == {"ok": True}
+    assert r2 == {"ok": True}
+    await transport.close()
+
+
+@respx.mock
+async def test_initialize_is_idempotent():
+    route = respx.post(MCP_URL).mock(
+        side_effect=[INIT_RESPONSE, NOTIFICATION_ACCEPTED],
+    )
+    transport = AsyncMcpTransport(api_key="sk_test", base_url=MCP_URL)
+    await transport.initialize()
+    await transport.initialize()
+    assert route.call_count == 2
+    await transport.close()
+```
+
+- [ ] **Step 2: Run tests to verify they fail**
+
+Run: `pytest tests/unit/test_async_mcp_transport.py -v -k "concurrent_first_call or initialize_is_idempotent"`
+Expected: FAIL.
+
+- [ ] **Step 3: Implement double-checked async init**
+
+Mirror Task 4 in `src/hyperping/_async_mcp_transport.py`:
+
+```python
+async def initialize(self) -> dict[str, Any]:
+    """Async idempotent and concurrency-safe MCP handshake."""
+    async with self._init_lock:
+        if self._initialized:
+            return self._init_result
+        return await self._initialize_locked()
+
+async def _initialize_locked(self) -> dict[str, Any]:
+    remaining = self._init_blocked_until - time.monotonic()
+    if remaining > 0:
+        raise HyperpingRateLimitError(
+            "MCP initialize rate limit cool-off active; retry later",
+            retry_after=int(remaining) + 1,
+            status_code=200,
+        )
+    try:
+        result = await self._send_rpc(
+            "initialize",
+            {
+                "protocolVersion": _PROTOCOL_VERSION,
+                "capabilities": {},
+                "clientInfo": {"name": "hyperping-python", "version": __version__},
+            },
+        )
+        await self._send_rpc("notifications/initialized", is_notification=True)
+    except HyperpingRateLimitError as exc:
+        wait = exc.retry_after if exc.retry_after and exc.retry_after > 0 else 30
+        self._init_blocked_until = time.monotonic() + wait
+        raise
+    self._init_result = result.get("result", {}) if result else {}
+    self._initialized = True
+    self._init_blocked_until = 0.0
+    return self._init_result
+```
+
+Replace the TOCTOU block in `call_tool` (lines 150-153) with `await self.initialize()`.
+
+- [ ] **Step 4: Run tests to verify they pass**
+
+Run: `pytest tests/unit/test_async_mcp_transport.py -v`
+Expected: all PASS.
+
+- [ ] **Step 5: Refactor and verify**
+
+`pytest -q && ruff check src tests && mypy --strict src`. Expected: clean.
+
+- [ ] **Step 6: Commit**
+
+```bash
+git add src/hyperping/_async_mcp_transport.py tests/unit/test_async_mcp_transport.py
+git commit -m "fix(mcp): close TOCTOU race in lazy initialize; make initialize() idempotent (async)"
+```
+
+---
+
+### Task 6: Initialize cool-off latch behavior — sync (A2)
+
+**Files:**
+- Modify: `src/hyperping/_mcp_transport.py` (`call_tool`)
+- Test: `tests/unit/test_mcp_transport.py`
+
+- [ ] **Step 1: Write failing tests**
+
+Add to `tests/unit/test_mcp_transport.py`:
+
+```python
+@respx.mock
+def test_initialize_rate_limit_latches_cooloff(monkeypatch):
+    """After a rate-limited initialize, further call_tool calls short-circuit."""
+    fake_now = {"t": 1000.0}
+    monkeypatch.setattr(
+        "hyperping._mcp_transport.time.monotonic", lambda: fake_now["t"]
+    )
+
+    rl_response = httpx.Response(
+        200,
+        json={
+            "jsonrpc": "2.0",
+            "id": 1,
+            "error": {
+                "code": -32000,
+                "message": 'Hyperping MCP rate limit exceeded for "initialize" '
+                           "(5/5 per minute). Retry after 30s.",
+            },
+        },
+    )
+    route = respx.post(MCP_URL).mock(return_value=rl_response)
+
+    transport = McpTransport(api_key="sk_test", base_url=MCP_URL)
+    with pytest.raises(HyperpingRateLimitError):
+        transport.call_tool("some_tool")  # triggers initialize, gets latched
+    assert route.call_count == 1  # only the initialize POST was made
+
+    # Subsequent call_tool must not hit the network.
+    with pytest.raises(HyperpingRateLimitError) as exc_info:
+        transport.call_tool("some_tool")
+    assert route.call_count == 1  # still 1 — no further HTTP requests
+    assert exc_info.value.retry_after is not None
+    assert exc_info.value.retry_after >= 1
+
+    transport.close()
+
+
+@respx.mock
+def test_initialize_cooloff_clears_after_deadline(monkeypatch):
+    """Once the cool-off elapses, initialize is attempted again."""
+    fake_now = {"t": 1000.0}
+    monkeypatch.setattr(
+        "hyperping._mcp_transport.time.monotonic", lambda: fake_now["t"]
+    )
+
+    rl_msg = {
+        "jsonrpc": "2.0",
+        "id": 1,
+        "error": {
+            "code": -32000,
+            "message": "Hyperping MCP rate limit exceeded. Retry after 10s.",
+        },
+    }
+    ok_init = httpx.Response(
+        200,
+        json={"jsonrpc": "2.0", "id": 1, "result": {"protocolVersion": "2025-03-26"}},
+    )
+    ok_tool = httpx.Response(
+        200,
+        json={
+            "jsonrpc": "2.0",
+            "id": 2,
+            "result": {"content": [{"type": "text", "text": json.dumps({"ok": True})}]},
+        },
+    )
+
+    respx.post(MCP_URL).mock(
+        side_effect=[
+            httpx.Response(200, json=rl_msg),  # first initialize: rate-limited
+            ok_init,                            # second initialize: success
+            httpx.Response(202),                # notifications/initialized
+            ok_tool,                            # tool call
+        ]
+    )
+
+    transport = McpTransport(api_key="sk_test", base_url=MCP_URL)
+    with pytest.raises(HyperpingRateLimitError):
+        transport.call_tool("some_tool")
+
+    # Still latched.
+    with pytest.raises(HyperpingRateLimitError):
+        transport.call_tool("some_tool")
+
+    # Advance past the deadline.
+    fake_now["t"] += 100.0
+    result = transport.call_tool("some_tool")
+    assert result == {"ok": True}
+    transport.close()
+```
+
+- [ ] **Step 2: Run tests to verify they fail**
+
+Run: `pytest tests/unit/test_mcp_transport.py -v -k "latches_cooloff or cooloff_clears"`
+Expected: FAIL. The first test fails because today the second `call_tool` would re-attempt `initialize` and POST again.
+
+- [ ] **Step 3: Implement the short-circuit in `call_tool`**
+
+In `src/hyperping/_mcp_transport.py`, the `call_tool` block now reads:
+
+```python
+def call_tool(self, tool_name: str, arguments: dict[str, Any] | None = None) -> Any:
+    self.initialize()  # idempotent, latch-aware, raises HyperpingRateLimitError
+                       # if init is rate-limited or still under cool-off
+    ...
+```
+
+Because `_initialize_locked` already checks `_init_blocked_until` and raises before any HTTP work, and `initialize()` is now idempotent, no further change is needed inside `call_tool`. The two new tests must pass.
+
+- [ ] **Step 4: Run tests to verify they pass**
+
+Run: `pytest tests/unit/test_mcp_transport.py -v`
+Expected: all PASS.
+
+- [ ] **Step 5: Refactor and verify**
+
+`pytest -q && ruff check src tests && mypy --strict src`. Expected: clean. Confirm route.call_count assertions are exact, not >=, to nail down the no-HTTP-during-cooloff invariant.
+
+- [ ] **Step 6: Commit**
+
+```bash
+git add src/hyperping/_mcp_transport.py tests/unit/test_mcp_transport.py
+git commit -m "feat(mcp): latch initialize cool-off after rate-limit to stop bucket thrashing (sync)"
+```
+
+---
+
+### Task 7: Initialize cool-off latch behavior — async (A2 mirror)
+
+**Files:**
+- Modify: `src/hyperping/_async_mcp_transport.py`
+- Test: `tests/unit/test_async_mcp_transport.py`
+
+- [ ] **Step 1: Write failing tests**
+
+Mirror Task 6 tests in `tests/unit/test_async_mcp_transport.py`. Monkeypatch path: `"hyperping._async_mcp_transport.time.monotonic"`.
+
+- [ ] **Step 2: Run tests to verify they fail**
+
+Run: `pytest tests/unit/test_async_mcp_transport.py -v -k "latches_cooloff or cooloff_clears"`
+Expected: FAIL.
+
+- [ ] **Step 3: Implement (already done by Task 5's `_initialize_locked`)**
+
+The async cool-off behavior is already implemented in Task 5's `_initialize_locked`. Verify by running the new tests. Replace the `call_tool` TOCTOU block (already done) with `await self.initialize()`.
+
+- [ ] **Step 4: Run tests to verify they pass**
+
+Run: `pytest tests/unit/test_async_mcp_transport.py -v`
+Expected: all PASS.
+
+- [ ] **Step 5: Refactor and verify**
+
+`pytest -q && ruff check src tests && mypy --strict src`. Expected: clean.
+
+- [ ] **Step 6: Commit**
+
+```bash
+git add src/hyperping/_async_mcp_transport.py tests/unit/test_async_mcp_transport.py
+git commit -m "feat(mcp): latch initialize cool-off after rate-limit to stop bucket thrashing (async)"
+```
+
+---
+
+### Task 8: Pin "transient retry does not catch rate-limit" with explicit tests (A4)
+
+**Files:**
+- Modify: `src/hyperping/_mcp_transport.py` (comment only)
+- Modify: `src/hyperping/_async_mcp_transport.py` (comment only)
+- Test: `tests/unit/test_mcp_transport.py`
+- Test: `tests/unit/test_async_mcp_transport.py`
+
+- [ ] **Step 1: Write failing tests**
+
+Add to `tests/unit/test_mcp_transport.py`:
+
+```python
+@respx.mock
+def test_rate_limit_is_not_retried_by_call_tool():
+    """call_tool's transient retry loop must NOT retry HyperpingRateLimitError."""
+    route = respx.post(MCP_URL).mock(
+        return_value=httpx.Response(
+            429, text="Rate limited", headers={"retry-after": "5"},
+        ),
+    )
+    transport = McpTransport(api_key="sk_test", base_url=MCP_URL, max_retries=3)
+    transport._initialized = True
+    with pytest.raises(HyperpingRateLimitError):
+        transport.call_tool("some_tool")
+    assert route.call_count == 1  # NOT 4 — no retry attempts
+    transport.close()
+
+
+@respx.mock
+def test_jsonrpc_rate_limit_is_not_retried_by_call_tool():
+    """JSON-RPC -32000 rate-limit (HTTP 200) must NOT trigger retries either."""
+    route = respx.post(MCP_URL).mock(
+        return_value=httpx.Response(
+            200,
+            json={
+                "jsonrpc": "2.0",
+                "id": 1,
+                "error": {
+                    "code": -32000,
+                    "message": "Hyperping MCP rate limit exceeded. Retry after 5s.",
+                },
+            },
+        ),
+    )
+    transport = McpTransport(api_key="sk_test", base_url=MCP_URL, max_retries=3)
+    transport._initialized = True
+    with pytest.raises(HyperpingRateLimitError):
+        transport.call_tool("some_tool")
+    assert route.call_count == 1
+    transport.close()
+```
+
+Mirror both in `tests/unit/test_async_mcp_transport.py`.
+
+- [ ] **Step 2: Run tests to verify they pass (current behavior already correct)**
+
+Run: `pytest tests/unit/test_mcp_transport.py tests/unit/test_async_mcp_transport.py -v -k "not_retried"`
+Expected: PASS already (the existing `except HyperpingAPIError as exc:` only retries 5xx; `HyperpingRateLimitError` is a subclass of `HyperpingAPIError` but its `status_code` is 429 or 200, neither of which is in the retry set). These tests pin the behavior.
+
+If for any reason they fail, the existing retry filter needs the explicit `and not isinstance(exc, HyperpingRateLimitError)` guard. The plan permits adding that guard if needed.
+
+- [ ] **Step 3: Tighten the comment**
+
+In `src/hyperping/_mcp_transport.py`, update the `call_tool` docstring lines:
+
+```python
+"""Call an MCP tool and return parsed response data.
+
+Auto-initializes on first call. Extracts and parses the JSON
+string from ``result.content[0].text``.
+
+Retries automatically on transient HTTP server errors (500, 502, 503, 504)
+up to ``max_retries`` times with exponential back-off. Rate-limit errors
+(HTTP 429 or JSON-RPC -32000) are NEVER retried at this layer; they raise
+:class:`HyperpingRateLimitError` immediately so callers can honour
+``retry_after``.
+"""
+```
+
+Mirror in the async transport.
+
+- [ ] **Step 4: Run tests to verify they pass**
+
+Run: `pytest -q`. Expected: all PASS.
+
+- [ ] **Step 5: Refactor and verify**
+
+`ruff check src tests && mypy --strict src`. Expected: clean.
+
+- [ ] **Step 6: Commit**
+
+```bash
+git add src/hyperping/_mcp_transport.py src/hyperping/_async_mcp_transport.py \
+       tests/unit/test_mcp_transport.py tests/unit/test_async_mcp_transport.py
+git commit -m "test(mcp): pin that call_tool retry never catches rate-limit; clarify docstring"
+```
+
+---
+
+### Task 9: `ensure_initialized()` on high-level clients (A5)
+
+**Files:**
+- Modify: `src/hyperping/mcp_client.py`
+- Modify: `src/hyperping/_async_mcp_client.py`
+- Test: `tests/unit/test_mcp_client.py`
+- Test: `tests/unit/test_async_mcp_client.py`
+
+- [ ] **Step 1: Write failing tests**
+
+Add to `tests/unit/test_mcp_client.py`:
+
+```python
+from hyperping.exceptions import HyperpingRateLimitError
+
+
+def test_ensure_initialized_delegates_to_transport():
+    client = make_client()
+    client.ensure_initialized()
+    client._transport.initialize.assert_called_once_with()
+
+
+def test_ensure_initialized_propagates_rate_limit():
+    client = make_client()
+    client._transport.initialize.side_effect = HyperpingRateLimitError(
+        "rate limited on initialize", retry_after=30, status_code=200,
+    )
+    with pytest.raises(HyperpingRateLimitError) as exc_info:
+        client.ensure_initialized()
+    assert exc_info.value.retry_after == 30
+```
+
+(Add `import pytest` at the top of the test file if not already imported.)
+
+Mirror for `tests/unit/test_async_mcp_client.py` with an async `make_client` and `await client.ensure_initialized()`.
+
+- [ ] **Step 2: Run tests to verify they fail**
+
+Run: `pytest tests/unit/test_mcp_client.py tests/unit/test_async_mcp_client.py -v -k "ensure_initialized"`
+Expected: FAIL with `AttributeError: 'HyperpingMcpClient' object has no attribute 'ensure_initialized'`.
+
+- [ ] **Step 3: Implement on both clients**
+
+Add to `src/hyperping/mcp_client.py`, in the `HyperpingMcpClient` class, immediately after `_call`:
+
+```python
+def ensure_initialized(self) -> None:
+    """Perform the MCP handshake now if it hasn't happened yet.
+
+    Useful for startup health checks: call this once on boot and catch
+    :class:`HyperpingRateLimitError` so you can decide whether to start
+    the rest of your service. Subsequent tool calls reuse the handshake.
+
+    Raises:
+        HyperpingRateLimitError: If the server rate-limits ``initialize``,
+            either via HTTP 429 or via the JSON-RPC ``-32000`` rate-limit
+            payload. Inspect ``.retry_after`` to back off.
+        HyperpingAuthError: If the API key is invalid.
+    """
+    self._transport.initialize()
+```
+
+Add to `src/hyperping/_async_mcp_client.py`, in `AsyncHyperpingMcpClient`, immediately after `_call`:
+
+```python
+async def ensure_initialized(self) -> None:
+    """Async counterpart to :meth:`HyperpingMcpClient.ensure_initialized`."""
+    await self._transport.initialize()
+```
+
+- [ ] **Step 4: Run tests to verify they pass**
+
+Run: `pytest tests/unit/test_mcp_client.py tests/unit/test_async_mcp_client.py -v`
+Expected: all PASS.
+
+- [ ] **Step 5: Refactor and verify**
+
+`pytest -q && ruff check src tests && mypy --strict src`. Expected: clean.
+
+- [ ] **Step 6: Commit**
+
+```bash
+git add src/hyperping/mcp_client.py src/hyperping/_async_mcp_client.py \
+       tests/unit/test_mcp_client.py tests/unit/test_async_mcp_client.py
+git commit -m "feat(mcp): add ensure_initialized() on HyperpingMcpClient and async counterpart"
+```
+
+---
+
+### Task 10: README "MCP rate limits" subsection (B)
+
+**Files:**
+- Modify: `README.md` (insert after the existing "MCP Client" section, around line 195)
+
+- [ ] **Step 1: Identify the insertion point**
+
+The new subsection goes immediately after the paragraph ending in `use the exported Pydantic models (e.g.,`OnCallSchedule`,`EscalationPolicy`) for validation if needed.` (around line 194). Insert before `### Healthchecks`.
+
+- [ ] **Step 2: Insert the new subsection**
+
+Paste the exact copy from the "Documentation copy (final)" section of this plan. Do not paraphrase. The block begins with `### MCP rate limits and connection lifecycle` and ends with the fenced Python example showing `mcp.ensure_initialized()`.
+
+- [ ] **Step 3: Verify markdown renders cleanly**
+
+Run any local markdown linter if available, or visually inspect for unbalanced fences. `ruff` does not lint markdown; rely on inspection.
+
+- [ ] **Step 4: Run the full project suite**
+
+Run: `pytest -q`
+Expected: PASS (no code changes in this task, but confirms nothing else regressed).
+
+- [ ] **Step 5: Refactor and verify**
+
+Re-read the new subsection in context. Confirm it flows from the prior MCP Client section and that the example uses an already-imported symbol (`HyperpingMcpClient`, `HyperpingRateLimitError`).
+
+- [ ] **Step 6: Commit**
+
+```bash
+git add README.md
+git commit -m "docs: add MCP rate limits and connection lifecycle section"
+```
+
+---
+
+### Task 11: CHANGELOG `[Unreleased]` entry (B)
+
+**Files:**
+- Modify: `CHANGELOG.md`
+
+- [ ] **Step 1: Insert new heading at the top**
+
+Above the existing `## [1.6.0] - 2026-05-06` heading, insert the exact `[Unreleased]` block from the "Documentation copy (final)" section.
+
+- [ ] **Step 2: Verify file structure**
+
+The file now begins with the existing preamble, then `## [Unreleased]`, then the 1.6.0 section. No version bump is performed (release versioning is a separate PR per project convention).
+
+- [ ] **Step 3: Run the full project suite**
+
+Run: `pytest -q`
+Expected: PASS.
+
+- [ ] **Step 4: Refactor and verify**
+
+Inspect: the `[Unreleased]` block lists `### Added` and `### Fixed`, both matching the changes in this PR. No stale entries.
+
+- [ ] **Step 5: Commit**
+
+```bash
+git add CHANGELOG.md
+git commit -m "docs: add Unreleased CHANGELOG entry for MCP rate-limit fixes"
+```
+
+---
+
+### Task 12: Final integration sweep
+
+**Files:**
+- All previously modified.
+
+- [ ] **Step 1: Run full project verification**
+
+Run in sequence:
+
+```bash
+ruff check src tests
+mypy --strict src
+pytest -q
+```
+
+Expected: all green. `pytest` enforces `--cov-fail-under=85` from `pyproject.toml`; coverage must remain at or above that threshold.
+
+- [ ] **Step 2: Manually verify the user's repro pattern**
+
+Open a Python REPL inside the worktree (after `pip install -e .` in a venv if not already done) and exercise the user's snippet against a `respx` mock that returns the JSON-RPC rate-limit on `initialize`:
+
+```python
+import respx, httpx
+from hyperping import HyperpingMcpClient, HyperpingRateLimitError
+
+with respx.mock:
+    respx.post("https://api.hyperping.io/v1/mcp").mock(
+        return_value=httpx.Response(
+            200,
+            json={
+                "jsonrpc": "2.0", "id": 1,
+                "error": {
+                    "code": -32000,
+                    "message": 'Hyperping MCP rate limit exceeded for "initialize" '
+                               "(5/5 per minute). Retry after 32s.",
+                },
+            },
+        ),
+    )
+    for i in range(6):
+        with HyperpingMcpClient(api_key="sk_test") as mcp:
+            try:
+                mcp.ensure_initialized()
+            except HyperpingRateLimitError as e:
+                print(i, "rate-limited; retry_after=", e.retry_after)
+                break
+```
+
+Expected output: `0 rate-limited; retry_after= 32`. The exception is the typed one, with the parsed `retry_after`. (This manual check is not added to the suite; the per-test coverage above is the canonical verification.)
+
+- [ ] **Step 3: Sanity-check the diff**
+
+```bash
+git diff --stat main...HEAD
+```
+
+Expected files changed:
+- `src/hyperping/_mcp_transport.py`
+- `src/hyperping/_async_mcp_transport.py`
+- `src/hyperping/mcp_client.py`
+- `src/hyperping/_async_mcp_client.py`
+- `tests/unit/test_mcp_transport.py`
+- `tests/unit/test_async_mcp_transport.py`
+- `tests/unit/test_mcp_client.py`
+- `tests/unit/test_async_mcp_client.py`
+- `README.md`
+- `CHANGELOG.md`
+- `docs/plans/2026-05-21-mcp-rate-limit-fixes.md` (this plan)
+
+No other files. Verify.
+
+- [ ] **Step 4: Final commit (if anything was left)**
+
+If steps 1-3 surfaced no additional changes, this task closes without a new commit. Otherwise:
+
+```bash
+git add -p   # carefully stage only the residual fixes
+git commit -m "chore(mcp): final cleanup after rate-limit fixes"
+```
+
+- [ ] **Step 5: Confirm completion criteria**
+
+Re-verify against the "Completion Standard" section above:
+
+- pytest green with coverage >= 85%.
+- ruff clean.
+- mypy --strict clean.
+- Public API unchanged except for the additive `ensure_initialized()`.
+- README and CHANGELOG updated.
+
+Done.
+
+---
+
+## Remember
+
+- Sync and async transports must remain symmetric. Every behavior change lands in both.
+- The two locks (`self._lock` for the id counter, `self._init_lock` for initialization state) are intentional. Do not collapse them back into one.
+- `time.monotonic()` is the cool-off clock. Do not switch to wall clock or event-loop time.
+- The JSON-RPC rate-limit classifier checks BOTH `code == -32000` AND the message substring. Do not match on one alone.
+- `HyperpingRateLimitError.status_code` is the underlying HTTP code (429 for HTTP rate-limits, 200 for JSON-RPC rate-limits). This is intentional and documented.
+- No new runtime dependencies. Tests already have `respx` and `pytest-asyncio`.
+- DRY, YAGNI, Red/Green/Refactor TDD, frequent commits.

From 7d3600952a04d413b3cc2cf580f2bee657e46d68 Mon Sep 17 00:00:00 2001
From: Khaled Salhab <khaled.salhab@develeap.com>
Date: Thu, 21 May 2026 02:30:06 +0300
Subject: [PATCH 3/5] docs(plan): MCP rate-limit fixes test plan
 (A1+A2+A3+A4+A5+B)

---
 ...26-05-21-mcp-rate-limit-fixes-test-plan.md | 574 ++++++++++++++++++
 1 file changed, 574 insertions(+)
 create mode 100644 docs/plans/2026-05-21-mcp-rate-limit-fixes-test-plan.md

diff --git a/docs/plans/2026-05-21-mcp-rate-limit-fixes-test-plan.md b/docs/plans/2026-05-21-mcp-rate-limit-fixes-test-plan.md
new file mode 100644
index 0000000..34f9a55
--- /dev/null
+++ b/docs/plans/2026-05-21-mcp-rate-limit-fixes-test-plan.md
@@ -0,0 +1,574 @@
+# MCP Rate-Limit Fixes Test Plan
+
+Companion to `docs/plans/2026-05-21-mcp-rate-limit-fixes.md`. Drives quality for the
+A1+A2+A3+A4+A5+B cutover that landed JSON-RPC rate-limit detection, an
+`initialize` cool-off latch, the TOCTOU fix, narrowed transient retry, the new
+`ensure_initialized()` method, and the README/CHANGELOG documentation.
+
+## Strategy reconciliation
+
+No prior testing strategy was approved in the conversation; the implementation
+plan itself is prescriptive about the testing approach. This test plan adopts
+that prescription (respx-mocked unit tests, sync and async parity, no new
+runtime dependencies) and adds the few coverage gaps the plan listed loosely
+(idempotency of `initialize()`, concurrent first-call de-duplication via
+`threading.Barrier` / `asyncio.gather`, latch short-circuit observed at the
+`route.call_count` boundary, a monkeypatched `time.monotonic` to advance the
+cool-off deadline, and `ensure_initialized()` propagation tests on both
+high-level clients). Two assumptions are documented inline below where they
+arise. No cost or scope change versus the plan, so no
+`## Strategy changes requiring user approval` section is needed.
+
+## Harness requirements
+
+No new harnesses are built. All tests run inside the existing
+`pytest` + `respx` + `pytest-asyncio` setup defined in `pyproject.toml`. The
+plan adopts the following conventions for the new tests, each chosen because
+the harness it relies on already exists in the suite:
+
+- **respx route handles** (`respx.post(MCP_URL).mock(...)`): the plan uses
+  `route.call_count` to assert "no further HTTP traffic" while the latch is
+  active. This is the highest-fidelity check available against the real
+  transport (`httpx.Client`/`AsyncClient`) without a live server.
+- **`monkeypatch` of `hyperping._mcp_transport.time.monotonic` and
+  `hyperping._async_mcp_transport.time.monotonic`**: the cool-off deadline is
+  a `time.monotonic()` value; monkeypatching the module-level reference lets
+  tests advance time without sleeping. The plan explicitly chose this clock so
+  the patch path is stable.
+- **`threading.Barrier(2)` + two `threading.Thread`s** for the sync TOCTOU
+  test; **`asyncio.gather`** for its async mirror. Both exercise the
+  double-checked init under genuine concurrency rather than a serialized
+  re-entry.
+- **`unittest.mock.MagicMock` / `AsyncMock`** swapped onto `client._transport`
+  for the high-level `ensure_initialized()` propagation tests, matching the
+  pattern already used throughout `tests/unit/test_mcp_client.py` and
+  `tests/unit/test_async_mcp_client.py`.
+
+The action space is the SDK's public/observable surface:
+
+- `McpTransport.initialize`, `McpTransport.call_tool`, `McpTransport.close`,
+  context-manager entry/exit, and the private `_send_rpc` path responsible
+  for HTTP-status and JSON-RPC error classification.
+- `AsyncMcpTransport.initialize`, `AsyncMcpTransport.call_tool`,
+  `AsyncMcpTransport.close`, async context-manager, and async `_send_rpc`.
+- `HyperpingMcpClient.ensure_initialized` (new) plus the rest of its already
+  covered tool methods that route through `_transport.call_tool`.
+- `AsyncHyperpingMcpClient.ensure_initialized` (new) plus the existing async
+  tool methods.
+- The `HyperpingRateLimitError` exception shape: `message`, `retry_after`,
+  `status_code`, `response_body`.
+
+Sources of truth:
+
+- The implementation plan (`docs/plans/2026-05-21-mcp-rate-limit-fixes.md`)
+  for behavioral contracts, invariants, the documented rate-limit message
+  format, the cool-off latch contract, and the two-lock structure.
+- The MCP 2025-03-26 spec (cited in the plan) for handshake semantics
+  (`initialize` then `notifications/initialized`), referenced only to confirm
+  test responses are realistic.
+- Hyperping's documented rate-limit signal shapes (HTTP 429 + `Retry-After`;
+  HTTP 200 + JSON-RPC `error.code = -32000` + message text) per the plan.
+- The existing test suite for fixture/helper patterns (`INIT_RESPONSE`,
+  `NOTIFICATION_ACCEPTED`, `_tool_response`).
+
+## Test plan
+
+Tests are numbered in priority order. Each entry lists name, type,
+disposition, harness, preconditions, actions, expected outcome, and notable
+interactions.
+
+### Acceptance gates (problem-statement red checks)
+
+#### 1. JSON-RPC rate-limit on `initialize` raises `HyperpingRateLimitError` (sync)
+
+- **Name**: A fresh client whose `initialize` is rate-limited via JSON-RPC
+  `-32000` surfaces a typed `HyperpingRateLimitError` with the parsed
+  `retry_after`.
+- **Type**: scenario
+- **Disposition**: new
+- **Harness**: pytest + respx, `McpTransport` real instance
+- **Preconditions**: respx mocks the MCP endpoint to return HTTP 200 with
+  body `{"jsonrpc": "2.0", "id": 1, "error": {"code": -32000, "message":
+  'Hyperping MCP rate limit exceeded for "initialize" (5/5 per minute).
+  Retry after 32s.'}}`. `transport._initialized = True` so the test
+  exercises the `_send_rpc` classifier directly (the cool-off and TOCTOU
+  tests cover the full path).
+- **Actions**: `transport.call_tool("some_tool")`.
+- **Expected outcome** (sources: implementation plan A1, exception module):
+  `HyperpingRateLimitError` is raised. `exc.retry_after == 32`,
+  `exc.status_code == 200`, `"rate limit" in exc.message.lower()`,
+  `exc.response_body["code"] == -32000`. The exception is NOT a generic
+  `HyperpingAPIError` carrying only the message string.
+- **Interactions**: `_send_rpc` JSON-RPC error classification path,
+  `HyperpingRateLimitError.__init__`.
+
+#### 2. JSON-RPC rate-limit on `initialize` raises `HyperpingRateLimitError` (async)
+
+- **Name**: Async mirror of Test 1.
+- **Type**: scenario
+- **Disposition**: new
+- **Harness**: pytest-asyncio + respx, `AsyncMcpTransport` real instance.
+- **Preconditions**: same respx body as Test 1; `transport._initialized =
+  True`.
+- **Actions**: `await transport.call_tool("some_tool")`.
+- **Expected outcome**: same as Test 1.
+- **Interactions**: async `_send_rpc` classifier.
+
+#### 3. User repro: six fresh clients under JSON-RPC rate-limit fail fast and typed (sync)
+
+- **Name**: Reproducing the exact paste-ready snippet from the user's bug
+  report, every iteration raises `HyperpingRateLimitError` with the parsed
+  `retry_after` instead of a generic `HyperpingAPIError`.
+- **Type**: scenario / regression
+- **Disposition**: new (test file:
+  `tests/unit/test_mcp_transport.py`)
+- **Harness**: pytest + respx, full `McpTransport` constructed inside the
+  loop (so each iteration performs its own `initialize`).
+- **Preconditions**: respx mocks every POST to MCP_URL with the JSON-RPC
+  rate-limit body from Test 1.
+- **Actions**: loop `for _ in range(6)`: construct `McpTransport`, call
+  `transport.call_tool("list_monitors", {"status": "ssl_expiring"})`,
+  capture exception, `transport.close()`.
+- **Expected outcome** (source: plan §Completion Standard): every iteration
+  raises `HyperpingRateLimitError` with `retry_after == 32` and
+  `status_code == 200`. Six exceptions, none generic, none silent. Each
+  client closes cleanly.
+- **Interactions**: full handshake path including `_initialize_locked`,
+  cool-off latch arming (each instance latches its own); `close()`.
+
+#### 4. Existing HTTP 429 detection on `tools/call` still works (sync, regression)
+
+- **Name**: HTTP 429 with `Retry-After: 30` continues to raise
+  `HyperpingRateLimitError(retry_after=30, status_code=429)`.
+- **Type**: regression (characterization)
+- **Disposition**: existing (`test_call_tool_http_429_with_retry_after`,
+  `test_call_tool_http_429_no_retry_after`,
+  `test_call_tool_http_429_non_integer_retry_after`)
+- **Harness**: pytest + respx.
+- **Preconditions**: as in current tests.
+- **Actions**: as in current tests.
+- **Expected outcome**: existing assertions hold unchanged. Backward-compat
+  contract (plan §Backward compatibility) verified.
+- **Interactions**: HTTP-status branch in `_send_rpc`.
+
+#### 5. Async HTTP 429 detection regression
+
+- **Name**: Async mirror of Test 4.
+- **Type**: regression
+- **Disposition**: existing in `tests/unit/test_async_mcp_transport.py` (the
+  existing async 429 cases); confirm they still pass.
+- **Harness**: pytest-asyncio + respx.
+- **Preconditions / Actions / Expected outcome / Interactions**: same as
+  Test 4 against `AsyncMcpTransport`.
+
+### High-value scenario and integration tests
+
+#### 6. Concurrent first calls trigger exactly one `initialize` (sync)
+
+- **Name**: Two threads racing the first `call_tool` on the same transport
+  produce a single `initialize` POST, not two.
+- **Type**: integration / invariant
+- **Disposition**: new
+- **Harness**: pytest + respx + `threading.Barrier(2)` + two
+  `threading.Thread`.
+- **Preconditions**: respx mock returns a four-message side_effect
+  sequence: `INIT_RESPONSE`, `NOTIFICATION_ACCEPTED`, two `_tool_response`s
+  (one per thread).
+- **Actions**: Both threads wait on `barrier.wait()`, then call
+  `transport.call_tool("some_tool")`. Join both.
+- **Expected outcome** (source: plan §Invariant 1, A3): total respx
+  `route.call_count == 4` (one `initialize`, one `notifications/initialized`,
+  two tool calls). Both threads return `{"ok": True}`. No
+  `IndexError`/`StopIteration` from the side_effect being exhausted by a
+  duplicate `initialize`.
+- **Interactions**: `_init_lock` double-checked pattern; `_send_rpc`;
+  `_next_id` under `_lock` (the request-id lock, deliberately separate per
+  plan §Final lock structure).
+- **Assumption documented in test docstring**: respx's side_effect ordering
+  is deterministic for serialized POSTs, but the two `tools/call` responses
+  are interchangeable (both `{"ok": True}`), removing any ordering brittleness
+  between the two threads.
+
+#### 7. Concurrent first calls trigger exactly one `initialize` (async)
+
+- **Name**: Async mirror of Test 6 via `asyncio.gather`.
+- **Type**: integration / invariant
+- **Disposition**: new
+- **Harness**: pytest-asyncio + respx.
+- **Preconditions**: same four-message side_effect on the route.
+- **Actions**: `await asyncio.gather(transport.call_tool("some_tool"),
+  transport.call_tool("some_tool"))`.
+- **Expected outcome**: both results equal `{"ok": True}`; route
+  `call_count == 4`. The async `_init_lock` correctly serializes the
+  handshake while letting the second coroutine see `_initialized=True` on
+  re-acquire.
+- **Interactions**: `asyncio.Lock` semantics across `await`; async
+  `_initialize_locked`.
+
+#### 8. `initialize()` is idempotent (sync)
+
+- **Name**: Calling `transport.initialize()` twice POSTs only one handshake
+  pair (initialize + notification).
+- **Type**: invariant
+- **Disposition**: new
+- **Harness**: pytest + respx.
+- **Preconditions**: respx mock with two-message side_effect (`INIT_RESPONSE`,
+  `NOTIFICATION_ACCEPTED`).
+- **Actions**: `transport.initialize(); transport.initialize();`.
+- **Expected outcome** (source: plan §A3, idempotent contract):
+  `route.call_count == 2` (initialize + notification, not 4). Second call
+  returns the cached `_init_result`.
+- **Interactions**: `_init_lock` fast-path return; `_init_result` cache.
+
+#### 9. `initialize()` is idempotent (async)
+
+- **Name**: Async mirror of Test 8.
+- **Type**: invariant
+- **Disposition**: new
+- **Harness**: pytest-asyncio + respx.
+- **Preconditions / Actions**: `await transport.initialize()` twice.
+- **Expected outcome**: `route.call_count == 2`.
+- **Interactions**: async `_init_lock` fast path.
+
+#### 10. Cool-off latch trips on rate-limited `initialize` and blocks further HTTP (sync)
+
+- **Name**: After a rate-limited `initialize`, subsequent `call_tool`
+  invocations raise `HyperpingRateLimitError` with no further network
+  traffic until the cool-off deadline.
+- **Type**: scenario
+- **Disposition**: new
+- **Harness**: pytest + respx + `monkeypatch` of
+  `hyperping._mcp_transport.time.monotonic`.
+- **Preconditions**: monkeypatched monotonic clock starts at `t = 1000.0`.
+  respx mock returns the JSON-RPC rate-limit body (`Retry after 30s.`) for
+  every POST.
+- **Actions**:
+  1. `transport.call_tool("some_tool")` -> expect
+     `HyperpingRateLimitError`; assert `route.call_count == 1`.
+  2. `transport.call_tool("some_tool")` (clock unchanged) -> expect
+     `HyperpingRateLimitError`; assert `route.call_count == 1` (no extra
+     HTTP request).
+- **Expected outcome** (source: plan §A2, §Invariant 4): second call raises
+  typed error with `retry_after >= 1`, route `call_count` stays at 1.
+- **Interactions**: `_initialize_locked` arms `_init_blocked_until`; the
+  next `initialize()` call observes the remaining cool-off and raises before
+  any `_send_rpc`.
+
+#### 11. Cool-off latch trips and blocks further HTTP (async mirror of Test 10)
+
+- **Name**: Async mirror.
+- **Type**: scenario
+- **Disposition**: new
+- **Harness**: pytest-asyncio + respx +
+  `monkeypatch` of `hyperping._async_mcp_transport.time.monotonic`.
+- **Preconditions / Actions / Expected outcome / Interactions**: same shape
+  as Test 10.
+
+#### 12. Cool-off clears after deadline elapses; next call re-initializes (sync)
+
+- **Name**: Once the monotonic clock advances past `_init_blocked_until`,
+  the next `call_tool` performs a fresh `initialize` and the tool call
+  succeeds.
+- **Type**: scenario / invariant
+- **Disposition**: new
+- **Harness**: pytest + respx + monkeypatched `time.monotonic`.
+- **Preconditions**: respx side_effect sequence: rate-limit response,
+  successful `INIT_RESPONSE`, `NOTIFICATION_ACCEPTED`, successful tool
+  response.
+- **Actions**:
+  1. `transport.call_tool("some_tool")` -> latch arms; expect
+     `HyperpingRateLimitError`.
+  2. `transport.call_tool("some_tool")` (clock unchanged) -> still latched.
+  3. Advance `fake_now["t"] += 100.0`.
+  4. `result = transport.call_tool("some_tool")` -> succeeds.
+- **Expected outcome** (source: plan §Invariant 5): final result is
+  `{"ok": True}`. The cool-off was cleared (`_init_blocked_until == 0.0`
+  after success), and exactly one re-initialize occurred.
+- **Interactions**: `_initialize_locked` clears `_init_blocked_until` on
+  success.
+
+#### 13. Cool-off clears after deadline (async mirror of Test 12)
+
+- **Name**: Async mirror of Test 12.
+- **Type**: scenario / invariant
+- **Disposition**: new
+- **Harness**: pytest-asyncio + respx + monkeypatched
+  `_async_mcp_transport.time.monotonic`.
+- **Preconditions / Actions / Expected outcome / Interactions**: same as
+  Test 12.
+
+#### 14. `ensure_initialized()` performs the handshake and is idempotent (sync, transport-integrated)
+
+- **Name**: Calling `client.ensure_initialized()` triggers `initialize` then
+  `notifications/initialized`; calling it a second time is a no-op.
+- **Type**: integration
+- **Disposition**: new (`tests/unit/test_mcp_client.py`)
+- **Harness**: pytest + respx with a real `HyperpingMcpClient` (do NOT
+  swap `_transport` for this case so that we exercise the real path
+  end-to-end through the transport).
+- **Preconditions**: respx mock returns `INIT_RESPONSE` and
+  `NOTIFICATION_ACCEPTED` in sequence.
+- **Actions**: `client = HyperpingMcpClient(api_key="sk_test",
+  base_url=MCP_URL); client.ensure_initialized();
+  client.ensure_initialized(); client.close()`.
+- **Expected outcome** (source: plan §A5): `route.call_count == 2` (one
+  initialize, one notification). No exception. The second call is a no-op.
+- **Interactions**: `HyperpingMcpClient.ensure_initialized` delegates to
+  `transport.initialize`; the transport's `_init_lock` provides idempotency.
+- **Assumption documented in test docstring**: We allow `MCP_URL` import
+  via `tests/unit/test_mcp_transport.MCP_URL` (already a re-export of the
+  module URL); using the same base_url keeps the existing transport tests'
+  pattern.
+
+#### 15. `ensure_initialized()` delegates to transport and propagates `HyperpingRateLimitError` (sync, mocked transport)
+
+- **Name**: With `_transport` replaced by a `MagicMock`,
+  `client.ensure_initialized()` calls `_transport.initialize` exactly once,
+  and a `HyperpingRateLimitError` raised by the transport propagates
+  unchanged.
+- **Type**: unit
+- **Disposition**: new (`tests/unit/test_mcp_client.py`)
+- **Harness**: pytest + `unittest.mock.MagicMock` (matches existing
+  `make_client()` pattern in that file).
+- **Preconditions**: `client._transport = MagicMock()`.
+- **Actions**:
+  1. `client.ensure_initialized()` then assert
+     `client._transport.initialize.assert_called_once_with()`.
+  2. Set `client._transport.initialize.side_effect =
+     HyperpingRateLimitError("rate limited on initialize", retry_after=30,
+     status_code=200)`; call `client.ensure_initialized()`.
+- **Expected outcome**: First action invokes `initialize` exactly once.
+  Second action raises `HyperpingRateLimitError` with
+  `retry_after == 30` and `status_code == 200`.
+- **Interactions**: delegation contract; exception propagation through the
+  high-level client.
+
+#### 16. `ensure_initialized()` propagates `HyperpingRateLimitError` (async)
+
+- **Name**: Async mirror of Test 15.
+- **Type**: unit
+- **Disposition**: new (`tests/unit/test_async_mcp_client.py`)
+- **Harness**: pytest-asyncio + `unittest.mock.AsyncMock`.
+- **Preconditions / Actions / Expected outcome / Interactions**: same as
+  Test 15 with `AsyncHyperpingMcpClient` and `await
+  client.ensure_initialized()`.
+
+### Boundary, differential, and negative tests
+
+#### 17. JSON-RPC rate-limit without parseable seconds yields `retry_after=None` (sync)
+
+- **Name**: A `-32000` message without a `Retry after Xs` substring still
+  classifies as rate-limit with `retry_after=None`.
+- **Type**: boundary
+- **Disposition**: new
+- **Harness**: pytest + respx.
+- **Preconditions**: respx returns body `{"code": -32000, "message":
+  "Hyperping MCP rate limit exceeded. Try again later."}`.
+- **Actions**: `transport.call_tool("some_tool")` with
+  `transport._initialized = True`.
+- **Expected outcome**: `HyperpingRateLimitError` raised, `retry_after is
+  None`. Classification still fires on the substring + code combo.
+
+#### 18. JSON-RPC rate-limit without parseable seconds (async)
+
+- **Name**: Async mirror of Test 17.
+- **Type**: boundary
+- **Disposition**: new
+- **Harness**: pytest-asyncio + respx.
+
+#### 19. Non-rate-limit JSON-RPC error stays a generic `HyperpingAPIError` (sync)
+
+- **Name**: A JSON-RPC error with `code == -32601` ("Method not found") is
+  NOT misclassified as a rate-limit; it raises plain `HyperpingAPIError`.
+- **Type**: boundary / regression
+- **Disposition**: new (negative classification check; the existing
+  `test_call_tool_jsonrpc_error` is similar but the new test explicitly
+  asserts `not isinstance(exc, HyperpingRateLimitError)`).
+- **Harness**: pytest + respx.
+- **Preconditions / Actions / Expected outcome**: see plan Task 2 Step 1
+  test verbatim. Verifies the classifier checks both code AND message.
+
+#### 20. `-32000` without rate-limit substring stays generic (sync)
+
+- **Name**: A `-32000` error whose message does not contain "rate limit"
+  raises generic `HyperpingAPIError`, not `HyperpingRateLimitError`.
+- **Type**: boundary
+- **Disposition**: new
+- **Harness**: pytest + respx.
+- **Preconditions**: body `{"code": -32000, "message": "Some other server
+  error"}`.
+- **Expected outcome** (source: plan §Strategy Gate alternative 5, intentional
+  double-check): generic `HyperpingAPIError`. `isinstance(exc,
+  HyperpingRateLimitError)` is False.
+
+#### 21. Non-rate-limit JSON-RPC error stays generic (async)
+
+- **Name**: Async mirror of Test 19.
+- **Type**: boundary / regression
+- **Disposition**: new.
+
+#### 22. `-32000` without rate-limit substring stays generic (async)
+
+- **Name**: Async mirror of Test 20.
+- **Type**: boundary
+- **Disposition**: new.
+
+### Invariant tests
+
+#### 23. `call_tool` retry loop never retries `HyperpingRateLimitError` from HTTP 429 (sync)
+
+- **Name**: With `max_retries=3` and a permanent HTTP 429 response,
+  `call_tool` makes exactly one HTTP request before raising.
+- **Type**: invariant
+- **Disposition**: new
+- **Harness**: pytest + respx, `transport._initialized = True`.
+- **Preconditions**: respx returns `httpx.Response(429, text="Rate limited",
+  headers={"retry-after": "5"})` for every POST.
+- **Actions**: `transport.call_tool("some_tool")` raises.
+- **Expected outcome** (source: plan §Invariant 6, §A4): exception is
+  `HyperpingRateLimitError`; `route.call_count == 1` (NOT 4).
+- **Interactions**: retry filter in `call_tool` (only retries 500/502/503/504);
+  exception hierarchy (`HyperpingRateLimitError` inherits from
+  `HyperpingAPIError` but its `status_code` is outside the retry set).
+
+#### 24. `call_tool` retry loop never retries JSON-RPC rate-limit (sync)
+
+- **Name**: With `max_retries=3` and a permanent JSON-RPC `-32000`
+  rate-limit response, `call_tool` makes exactly one HTTP request before
+  raising.
+- **Type**: invariant
+- **Disposition**: new
+- **Harness**: pytest + respx, `transport._initialized = True`.
+- **Preconditions**: respx returns HTTP 200 with the rate-limit JSON-RPC body.
+- **Actions**: `transport.call_tool("some_tool")` raises.
+- **Expected outcome**: `HyperpingRateLimitError` raised;
+  `route.call_count == 1`.
+- **Interactions**: the `status_code=200` on the JSON-RPC path must NOT
+  satisfy the `in (500, 502, 503, 504)` filter even though
+  `HyperpingRateLimitError` is a subclass of `HyperpingAPIError`.
+
+#### 25. `call_tool` rate-limit retry invariants (async)
+
+- **Name**: Async mirrors of Tests 23 and 24.
+- **Type**: invariant
+- **Disposition**: new (two new tests).
+- **Harness**: pytest-asyncio + respx.
+
+#### 26. 5xx retry path still functions after the refactor (sync, regression)
+
+- **Name**: A persistent HTTP 502 still raises `HyperpingAPIError` after
+  exactly `max_retries + 1` attempts.
+- **Type**: regression
+- **Disposition**: existing (`test_call_tool_retry_exhausted`); confirm
+  unchanged behavior.
+- **Harness**: pytest + respx + `patch("hyperping._mcp_transport.time.sleep")`.
+- **Expected outcome**: existing assertions hold; the comment-only change
+  in `call_tool` does not alter behavior. Implicitly verifies that the
+  refactor and the new `_init_lock` did not regress transient-retry
+  semantics.
+
+#### 27. 5xx retry path still functions (async)
+
+- **Name**: Equivalent of Test 26 against `AsyncMcpTransport`.
+- **Type**: regression
+- **Disposition**: existing async retry test (if present) or new minimal
+  mirror.
+
+### Coverage gates and meta-checks
+
+#### 28. Coverage threshold preserved
+
+- **Name**: `pytest --cov-fail-under=85` still passes after the change.
+- **Type**: invariant (project-level)
+- **Disposition**: existing (`pyproject.toml` `[tool.pytest.ini_options]`).
+- **Harness**: pytest's coverage plugin (already configured).
+- **Expected outcome**: full project run is green and coverage is at or
+  above 85%.
+
+#### 29. Static checks clean
+
+- **Name**: `ruff check src tests` and `mypy --strict src` are clean after
+  the change.
+- **Type**: invariant
+- **Disposition**: existing (project linters).
+- **Harness**: `ruff`, `mypy` configured in `pyproject.toml`.
+- **Expected outcome**: zero findings. Catches accidental imports,
+  unused names, or type-annotation drift introduced by the new lock and
+  cool-off attributes.
+
+#### 30. Documentation artifacts present and well-formed
+
+- **Name**: README contains the new "MCP rate limits and connection
+  lifecycle" subsection, and CHANGELOG contains an `[Unreleased]` block with
+  `### Added` and `### Fixed`.
+- **Type**: regression (artifact)
+- **Disposition**: new lightweight assertion (one test or one CI grep step;
+  prefer a pytest assertion to keep the gate in the suite).
+- **Harness**: pytest reading the repo files via `pathlib`.
+- **Preconditions**: working directory is the repo root.
+- **Actions**: read `README.md` and assert the literal heading
+  `### MCP rate limits and connection lifecycle` appears; read
+  `CHANGELOG.md` and assert `## [Unreleased]` appears as a top-level
+  heading and is followed somewhere in the file by `### Added` and
+  `### Fixed`.
+- **Expected outcome** (source: plan §B, §Documentation copy (final)):
+  both assertions pass. Catches accidental removal during merges.
+- **Interactions**: none; pure file content check.
+- **Note**: this is the smallest reliable replacement for human review of
+  the docs change; the alternative ("a person reads the README") is
+  forbidden by the test-plan rules.
+
+## Coverage summary
+
+### Covered
+
+- **A1 (JSON-RPC -32000 rate-limit classification)**: sync Tests 1, 17, 19,
+  20; async Tests 2, 18, 21, 22.
+- **A2 (cool-off latch)**: sync Tests 10, 12; async Tests 11, 13.
+- **A3 (TOCTOU init race + idempotent `initialize`)**: sync Tests 6, 8;
+  async Tests 7, 9.
+- **A4 (transient retry never catches rate-limit)**: sync Tests 23, 24;
+  async Test 25 (two cases). Test 26/27 protect the 5xx retry path.
+- **A5 (`ensure_initialized` on high-level clients)**: sync Tests 14, 15;
+  async Test 16.
+- **B (docs)**: Test 30 pins the README and CHANGELOG artifacts.
+- **Backward compatibility (HTTP 429 path, existing exception shape)**:
+  Tests 4, 5 (existing) plus the meta-tests 28 and 29.
+- **User repro fidelity**: Test 3 reproduces the user's paste-ready snippet
+  shape verbatim, and is the single most important acceptance gate for the
+  bug report.
+
+### Explicitly excluded per strategy
+
+- **Live integration against the real Hyperping MCP server**: the plan
+  forbids this (no live calls, no paid resources). Tests use `respx`
+  mocks; the user's repro fidelity is verified by mirroring the documented
+  rate-limit body shape.
+- **Cross-process session-id reuse and on-disk session persistence**:
+  removed from scope by the plan (server is stateless; spec mandates
+  handshake). No tests written.
+- **Server-side fixes (HTTP 429 vs 200, real rolling window, honest
+  Retry-After arithmetic, separate cap on `initialize`)**: tracked
+  separately as upstream feedback to Hyperping. Not testable from this
+  repo.
+- **Performance benchmarks**: the change is small and the fast path
+  (`initialize` already done) is a single lock acquire and attribute read.
+  Risk is low; the plan does not request perf tests. A latent regression
+  would surface as a wall-clock blowup in the existing suite.
+
+### Risks the exclusions carry
+
+- **Server message drift**: if Hyperping changes the rate-limit message to
+  drop the word "rate limit" or change the JSON-RPC code, the classifier
+  silently falls back to generic `HyperpingAPIError`. Mitigation: the
+  classifier checks BOTH code and substring (plan §Strategy Gate), so a
+  partial change still works; a total change would be caught the next time
+  a user reports the bug. Adding a probe test would require a live server.
+- **MCP spec churn**: a future MCP protocol version might change handshake
+  semantics. Out of scope for this PR; the `_PROTOCOL_VERSION` constant
+  pins the negotiated version and any drift will surface in the existing
+  `test_initialize` (Test 14's transport-integrated path).
+- **Server returns 429 + JSON-RPC `-32000` simultaneously**: undefined by
+  the spec. Current implementation handles 429 first (HTTP-status branch
+  runs before the JSON-RPC error branch), which is the desirable
+  precedence. Not tested explicitly; would be a follow-up if observed.

From 2acc4fce573a77f1c89e4e9f55108467fceb6098 Mon Sep 17 00:00:00 2001
From: Khaled Salhab <khaled.salhab@develeap.com>
Date: Thu, 21 May 2026 02:40:22 +0300
Subject: [PATCH 4/5] fix(mcp): detect rate limits, latch initialize cool-off,
 close TOCTOU race

Classify HTTP 200 + JSON-RPC error code -32000 with a rate-limit message
as HyperpingRateLimitError with the parsed retry_after, alongside the
existing HTTP 429 path. After a rate-limited initialize, latch a
process-monotonic cool-off so subsequent call_tool invocations on the
same client fail fast with HyperpingRateLimitError until the deadline
elapses, without burning more slots from the server's bucket.

Make initialize() idempotent under a dedicated _init_lock with the
double-checked flag, closing the lazy-init TOCTOU race where two
concurrent first calls could each POST initialize. Mirror the change in
the async transport using asyncio.Lock held across the awaitable
handshake. Pin that call_tool's transient retry never catches a
rate-limit (HTTP 429 or JSON-RPC -32000).

Add ensure_initialized() on HyperpingMcpClient and AsyncHyperpingMcpClient
for startup health checks, delegating to the transport's idempotent
initialize(). Document the new behaviour in README under "MCP rate
limits and connection lifecycle" and record it under an [Unreleased]
CHANGELOG block.
---
 CHANGELOG.md                           |  26 ++
 README.md                              |  45 ++++
 src/hyperping/_async_mcp_client.py     |  14 +
 src/hyperping/_async_mcp_transport.py  |  91 +++++--
 src/hyperping/_mcp_transport.py        |  90 +++++--
 src/hyperping/mcp_client.py            |  15 ++
 tests/unit/test_async_mcp_client.py    |  46 ++++
 tests/unit/test_async_mcp_transport.py | 257 ++++++++++++++++++
 tests/unit/test_mcp_client.py          |  89 +++++++
 tests/unit/test_mcp_transport.py       | 350 +++++++++++++++++++++++++
 10 files changed, 985 insertions(+), 38 deletions(-)

diff --git a/CHANGELOG.md b/CHANGELOG.md
index d57cc2e..52eee80 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -5,6 +5,32 @@ All notable changes to this project will be documented in this file.
 The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
 and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
 
+## [Unreleased]
+
+### Added
+
+- `ensure_initialized()` on `HyperpingMcpClient` and `AsyncHyperpingMcpClient` for
+  startup health checks. Performs the MCP handshake now if it hasn't happened yet
+  and raises `HyperpingRateLimitError` if the server's `initialize` cap is hit.
+- New "MCP rate limits and connection lifecycle" section in README documenting
+  Hyperping's stateless MCP server, the undocumented `initialize` cap, and the
+  recommended client lifetime per process.
+
+### Fixed
+
+- MCP rate-limit errors that the server returns as HTTP 200 with JSON-RPC
+  `error.code = -32000` (notably the `initialize` per-minute cap) are now
+  classified as `HyperpingRateLimitError` with `retry_after` parsed from the
+  message, instead of a generic `HyperpingAPIError`. Existing HTTP 429 handling is
+  unchanged.
+- After a rate-limit on `initialize`, the MCP transport latches a cool-off so
+  subsequent `call_tool` invocations short-circuit with `HyperpingRateLimitError`
+  until the advertised `retry_after` elapses, instead of issuing further HTTP
+  requests that would burn more slots from the bucket.
+- TOCTOU race in lazy `initialize` where two concurrent first calls on the same
+  `HyperpingMcpClient` could each POST `initialize`. The handshake is now
+  performed under a dedicated lock with a double-checked flag.
+
 ## [1.6.0] - 2026-05-06
 
 ### Added
diff --git a/README.md b/README.md
index 6088078..cb2f1cc 100644
--- a/README.md
+++ b/README.md
@@ -193,6 +193,51 @@ The MCP client uses the same API key as `HyperpingClient`. All methods return
 plain dicts/lists; use the exported Pydantic models (e.g., `OnCallSchedule`,
 `EscalationPolicy`) for validation if needed.
 
+### MCP rate limits and connection lifecycle
+
+The Hyperping MCP server (`https://api.hyperping.io/v1/mcp`) is stateless over HTTP
+and rate-limits per API key. The publicly documented limit is 300 requests per
+minute shared with the REST API, but the server also enforces a separate, low cap
+on the `initialize` handshake (observed around 5/minute). Because every new
+`HyperpingMcpClient` instance must perform the MCP `initialize` handshake on its
+first call, instantiating the client in a hot path or running several short-lived
+processes against one key will trip this cap.
+
+Operational guidance:
+
+- **Create one `HyperpingMcpClient` per process and reuse it.** Do not instantiate
+  it inside a loop. The first call performs the handshake; subsequent calls reuse
+  it for the life of the client.
+- **Catch `HyperpingRateLimitError` and honour `retry_after`.** Rate-limit signals
+  arrive two ways: as HTTP 429 (with a standard `Retry-After` header) and as a
+  JSON-RPC server error (`code: -32000`, HTTP 200) on `initialize`. Both surface as
+  `HyperpingRateLimitError` with `retry_after` parsed from whichever signal was
+  used. The `status_code` attribute is `429` or `200` respectively.
+- **Use `ensure_initialized()` for startup health checks.** Calling it once on
+  service boot lets you fail fast if the key is already at the `initialize` cap,
+  instead of failing on the first business call.
+- **Several workloads on one key collide on the `initialize` cap.** A weekly cron,
+  a watchdog daemon, and a developer running the CLI cannot all warm up the same
+  API key inside one minute. Use one long-lived process per workload, or separate
+  API keys per workload if your plan allows.
+- **After a rate-limit on `initialize`, the SDK latches a cool-off** so that
+  subsequent `call_tool` invocations on the same client fail fast with
+  `HyperpingRateLimitError` (no extra HTTP traffic) until `retry_after` elapses.
+  This prevents accidentally burning more slots from the bucket.
+
+```python
+from hyperping import HyperpingMcpClient, HyperpingRateLimitError
+
+mcp = HyperpingMcpClient(api_key="sk_...")
+try:
+    mcp.ensure_initialized()
+except HyperpingRateLimitError as e:
+    print(f"MCP cold-start rate-limited; retry in {e.retry_after}s")
+    raise
+
+summary = mcp.get_status_summary()
+```
+
 ### Healthchecks
 
 ```python
diff --git a/src/hyperping/_async_mcp_client.py b/src/hyperping/_async_mcp_client.py
index fab127c..df98061 100644
--- a/src/hyperping/_async_mcp_client.py
+++ b/src/hyperping/_async_mcp_client.py
@@ -63,6 +63,20 @@ async def _call(self, tool: str, args: dict[str, Any] | None = None) -> Any:
         """Call an MCP tool via the transport."""
         return await self._transport.call_tool(tool, args or {})
 
+    async def ensure_initialized(self) -> None:
+        """Perform the MCP handshake now if it hasn't happened yet.
+
+        Async counterpart to
+        :meth:`hyperping.mcp_client.HyperpingMcpClient.ensure_initialized`.
+
+        Raises:
+            HyperpingRateLimitError: If the server rate-limits ``initialize``,
+                either via HTTP 429 or via the JSON-RPC ``-32000`` rate-limit
+                payload. Inspect ``.retry_after`` to back off.
+            HyperpingAuthError: If the API key is invalid.
+        """
+        await self._transport.initialize()
+
     # ==================== Context Manager ====================
 
     async def close(self) -> None:
diff --git a/src/hyperping/_async_mcp_transport.py b/src/hyperping/_async_mcp_transport.py
index 72792e4..05568fb 100644
--- a/src/hyperping/_async_mcp_transport.py
+++ b/src/hyperping/_async_mcp_transport.py
@@ -4,6 +4,8 @@
 
 import asyncio
 import json
+import re
+import time
 from typing import Any
 
 import httpx
@@ -21,6 +23,9 @@
 
 _PROTOCOL_VERSION = "2025-03-26"
 
+_MCP_RATE_LIMIT_MARKER = "rate limit"
+_MCP_RATE_LIMIT_RETRY_AFTER_RE = re.compile(r"[Rr]etry after\s+(\d+)\s*s")
+
 
 class AsyncMcpTransport:
     """Async low-level JSON-RPC 2.0 client for the Hyperping MCP server.
@@ -51,6 +56,9 @@ def __init__(
         self._initialized = False
         self._request_id = 0
         self._lock = asyncio.Lock()
+        self._init_lock = asyncio.Lock()
+        self._init_blocked_until: float = 0.0
+        self._init_result: dict[str, Any] = {}
         self._max_retries = max_retries
 
     async def _next_id(self) -> int:
@@ -112,6 +120,25 @@ async def _send_rpc(
         data = resp.json()
         if "error" in data:
             err = data["error"]
+            if (
+                isinstance(err, dict)
+                and err.get("code") == -32000
+                and isinstance(err.get("message"), str)
+                and _MCP_RATE_LIMIT_MARKER in err["message"].lower()
+            ):
+                rl_retry_after: int | None = None
+                match = _MCP_RATE_LIMIT_RETRY_AFTER_RE.search(err["message"])
+                if match:
+                    try:
+                        rl_retry_after = int(match.group(1))
+                    except ValueError:  # defensive; regex guarantees digits
+                        rl_retry_after = None
+                raise HyperpingRateLimitError(
+                    err["message"],
+                    retry_after=rl_retry_after,
+                    status_code=resp.status_code,
+                    response_body=err,
+                )
             raise HyperpingAPIError(
                 f"MCP error {err.get('code', '?')}: {err.get('message', 'unknown')}",
                 status_code=resp.status_code,
@@ -120,19 +147,45 @@ async def _send_rpc(
         return data  # type: ignore[no-any-return]
 
     async def initialize(self) -> dict[str, Any]:
-        """Perform MCP handshake. Called automatically on first tool call."""
-        result = await self._send_rpc(
-            "initialize",
-            {
-                "protocolVersion": _PROTOCOL_VERSION,
-                "capabilities": {},
-                "clientInfo": {"name": "hyperping-python", "version": __version__},
-            },
-        )
-        await self._send_rpc("notifications/initialized", is_notification=True)
-        async with self._lock:
-            self._initialized = True
-        return result.get("result", {}) if result else {}
+        """Async idempotent and concurrency-safe MCP handshake.
+
+        Calling this more than once on the same transport is a no-op after the
+        first successful handshake. While an ``initialize`` cool-off latch is
+        active, raises :class:`HyperpingRateLimitError` without issuing any
+        HTTP request.
+        """
+        async with self._init_lock:
+            if self._initialized:
+                return self._init_result
+            return await self._initialize_locked()
+
+    async def _initialize_locked(self) -> dict[str, Any]:
+        """Perform the handshake. Assumes ``self._init_lock`` is held."""
+        remaining = self._init_blocked_until - time.monotonic()
+        if remaining > 0:
+            raise HyperpingRateLimitError(
+                "MCP initialize rate limit cool-off active; retry later",
+                retry_after=int(remaining) + 1,
+                status_code=200,
+            )
+        try:
+            result = await self._send_rpc(
+                "initialize",
+                {
+                    "protocolVersion": _PROTOCOL_VERSION,
+                    "capabilities": {},
+                    "clientInfo": {"name": "hyperping-python", "version": __version__},
+                },
+            )
+            await self._send_rpc("notifications/initialized", is_notification=True)
+        except HyperpingRateLimitError as exc:
+            wait = exc.retry_after if exc.retry_after and exc.retry_after > 0 else 30
+            self._init_blocked_until = time.monotonic() + wait
+            raise
+        self._init_result = result.get("result", {}) if result else {}
+        self._initialized = True
+        self._init_blocked_until = 0.0
+        return self._init_result
 
     async def call_tool(
         self,
@@ -144,13 +197,13 @@ async def call_tool(
         Auto-initializes on first call. Extracts and parses the JSON
         string from ``result.content[0].text``.
 
-        Retries automatically on transient server errors (HTTP 500, 502,
-        503, 504) up to ``max_retries`` times with exponential back-off.
+        Retries automatically on transient HTTP server errors (500, 502, 503, 504)
+        up to ``max_retries`` times with exponential back-off. Rate-limit errors
+        (HTTP 429 or JSON-RPC -32000) are NEVER retried at this layer; they raise
+        :class:`HyperpingRateLimitError` immediately so callers can honour
+        ``retry_after``.
         """
-        async with self._lock:
-            needs_init = not self._initialized
-        if needs_init:
-            await self.initialize()
+        await self.initialize()
 
         last_exc: Exception | None = None
         for attempt in range(self._max_retries + 1):
diff --git a/src/hyperping/_mcp_transport.py b/src/hyperping/_mcp_transport.py
index 15ff959..c36333e 100644
--- a/src/hyperping/_mcp_transport.py
+++ b/src/hyperping/_mcp_transport.py
@@ -3,6 +3,7 @@
 from __future__ import annotations
 
 import json
+import re
 import threading
 import time
 from typing import Any
@@ -22,6 +23,9 @@
 
 _PROTOCOL_VERSION = "2025-03-26"
 
+_MCP_RATE_LIMIT_MARKER = "rate limit"
+_MCP_RATE_LIMIT_RETRY_AFTER_RE = re.compile(r"[Rr]etry after\s+(\d+)\s*s")
+
 
 class McpTransport:
     """Low-level JSON-RPC 2.0 client for the Hyperping MCP server.
@@ -52,6 +56,9 @@ def __init__(
         self._initialized = False
         self._request_id = 0
         self._lock = threading.Lock()
+        self._init_lock = threading.Lock()
+        self._init_blocked_until: float = 0.0
+        self._init_result: dict[str, Any] = {}
         self._max_retries = max_retries
 
     def _next_id(self) -> int:
@@ -113,6 +120,25 @@ def _send_rpc(
         data = resp.json()
         if "error" in data:
             err = data["error"]
+            if (
+                isinstance(err, dict)
+                and err.get("code") == -32000
+                and isinstance(err.get("message"), str)
+                and _MCP_RATE_LIMIT_MARKER in err["message"].lower()
+            ):
+                rl_retry_after: int | None = None
+                match = _MCP_RATE_LIMIT_RETRY_AFTER_RE.search(err["message"])
+                if match:
+                    try:
+                        rl_retry_after = int(match.group(1))
+                    except ValueError:  # defensive; regex guarantees digits
+                        rl_retry_after = None
+                raise HyperpingRateLimitError(
+                    err["message"],
+                    retry_after=rl_retry_after,
+                    status_code=resp.status_code,
+                    response_body=err,
+                )
             raise HyperpingAPIError(
                 f"MCP error {err.get('code', '?')}: {err.get('message', 'unknown')}",
                 status_code=resp.status_code,
@@ -121,19 +147,45 @@ def _send_rpc(
         return data  # type: ignore[no-any-return]
 
     def initialize(self) -> dict[str, Any]:
-        """Perform MCP handshake. Called automatically on first tool call."""
-        result = self._send_rpc(
-            "initialize",
-            {
-                "protocolVersion": _PROTOCOL_VERSION,
-                "capabilities": {},
-                "clientInfo": {"name": "hyperping-python", "version": __version__},
-            },
-        )
-        self._send_rpc("notifications/initialized", is_notification=True)
-        with self._lock:
-            self._initialized = True
-        return result.get("result", {}) if result else {}
+        """Perform MCP handshake if not yet performed. Idempotent and thread-safe.
+
+        Calling this more than once on the same transport is a no-op after the
+        first successful handshake. While an ``initialize`` cool-off latch is
+        active, raises :class:`HyperpingRateLimitError` without issuing any
+        HTTP request.
+        """
+        with self._init_lock:
+            if self._initialized:
+                return self._init_result
+            return self._initialize_locked()
+
+    def _initialize_locked(self) -> dict[str, Any]:
+        """Perform the handshake. Assumes ``self._init_lock`` is held."""
+        remaining = self._init_blocked_until - time.monotonic()
+        if remaining > 0:
+            raise HyperpingRateLimitError(
+                "MCP initialize rate limit cool-off active; retry later",
+                retry_after=int(remaining) + 1,
+                status_code=200,
+            )
+        try:
+            result = self._send_rpc(
+                "initialize",
+                {
+                    "protocolVersion": _PROTOCOL_VERSION,
+                    "capabilities": {},
+                    "clientInfo": {"name": "hyperping-python", "version": __version__},
+                },
+            )
+            self._send_rpc("notifications/initialized", is_notification=True)
+        except HyperpingRateLimitError as exc:
+            wait = exc.retry_after if exc.retry_after and exc.retry_after > 0 else 30
+            self._init_blocked_until = time.monotonic() + wait
+            raise
+        self._init_result = result.get("result", {}) if result else {}
+        self._initialized = True
+        self._init_blocked_until = 0.0
+        return self._init_result
 
     def call_tool(
         self,
@@ -145,13 +197,13 @@ def call_tool(
         Auto-initializes on first call. Extracts and parses the JSON
         string from ``result.content[0].text``.
 
-        Retries automatically on transient server errors (HTTP 500, 502,
-        503, 504) up to ``max_retries`` times with exponential back-off.
+        Retries automatically on transient HTTP server errors (500, 502, 503, 504)
+        up to ``max_retries`` times with exponential back-off. Rate-limit errors
+        (HTTP 429 or JSON-RPC -32000) are NEVER retried at this layer; they raise
+        :class:`HyperpingRateLimitError` immediately so callers can honour
+        ``retry_after``.
         """
-        with self._lock:
-            needs_init = not self._initialized
-        if needs_init:
-            self.initialize()
+        self.initialize()
 
         last_exc: Exception | None = None
         for attempt in range(self._max_retries + 1):
diff --git a/src/hyperping/mcp_client.py b/src/hyperping/mcp_client.py
index 852707a..3341e6d 100644
--- a/src/hyperping/mcp_client.py
+++ b/src/hyperping/mcp_client.py
@@ -62,6 +62,21 @@ def _call(self, tool: str, args: dict[str, Any] | None = None) -> Any:
         """Call an MCP tool via the transport."""
         return self._transport.call_tool(tool, args or {})
 
+    def ensure_initialized(self) -> None:
+        """Perform the MCP handshake now if it hasn't happened yet.
+
+        Useful for startup health checks: call this once on boot and catch
+        :class:`HyperpingRateLimitError` so you can decide whether to start
+        the rest of your service. Subsequent tool calls reuse the handshake.
+
+        Raises:
+            HyperpingRateLimitError: If the server rate-limits ``initialize``,
+                either via HTTP 429 or via the JSON-RPC ``-32000`` rate-limit
+                payload. Inspect ``.retry_after`` to back off.
+            HyperpingAuthError: If the API key is invalid.
+        """
+        self._transport.initialize()
+
     # ==================== Context Manager ====================
 
     def close(self) -> None:
diff --git a/tests/unit/test_async_mcp_client.py b/tests/unit/test_async_mcp_client.py
index 6ab4111..3c76a2d 100644
--- a/tests/unit/test_async_mcp_client.py
+++ b/tests/unit/test_async_mcp_client.py
@@ -2,9 +2,13 @@
 
 from unittest.mock import AsyncMock
 
+import httpx
 import pytest
+import respx
 
 from hyperping._async_mcp_client import AsyncHyperpingMcpClient
+from hyperping._async_mcp_transport import MCP_URL
+from hyperping.exceptions import HyperpingRateLimitError
 from hyperping.models._integration_models import Integration
 from hyperping.models._monitor_models import Monitor
 from hyperping.models._observability_models import MonitorAnomaly, ProbeLogResponse
@@ -282,3 +286,45 @@ async def test_get_integration():
     result = await client.get_integration("int1")
     assert isinstance(result, Integration)
     client._transport.call_tool.assert_called_once_with("get_integration", {"uuid": "int1"})
+
+
+# -- ensure_initialized() (Task 9 / Test 16) ---------------------------------
+
+
+@pytest.mark.asyncio
+async def test_ensure_initialized_delegates_to_transport():
+    client = make_client()
+    await client.ensure_initialized()
+    client._transport.initialize.assert_awaited_once_with()
+
+
+@pytest.mark.asyncio
+async def test_ensure_initialized_propagates_rate_limit():
+    client = make_client()
+    client._transport.initialize.side_effect = HyperpingRateLimitError(
+        "rate limited on initialize", retry_after=30, status_code=200,
+    )
+    with pytest.raises(HyperpingRateLimitError) as exc_info:
+        await client.ensure_initialized()
+    assert exc_info.value.retry_after == 30
+    assert exc_info.value.status_code == 200
+
+
+@pytest.mark.asyncio
+@respx.mock
+async def test_ensure_initialized_real_transport_is_idempotent():
+    """Calling ensure_initialized twice through the real async transport POSTs only once."""
+    route = respx.post(MCP_URL).mock(
+        side_effect=[
+            httpx.Response(
+                200,
+                json={"jsonrpc": "2.0", "id": 1, "result": {"protocolVersion": "2025-03-26"}},
+            ),
+            httpx.Response(202),
+        ],
+    )
+    client = AsyncHyperpingMcpClient(api_key="sk_test", base_url=MCP_URL)
+    await client.ensure_initialized()
+    await client.ensure_initialized()
+    assert route.call_count == 2
+    await client.close()
diff --git a/tests/unit/test_async_mcp_transport.py b/tests/unit/test_async_mcp_transport.py
index b7404bc..9165a9a 100644
--- a/tests/unit/test_async_mcp_transport.py
+++ b/tests/unit/test_async_mcp_transport.py
@@ -1,5 +1,6 @@
 """Tests for the async MCP JSON-RPC 2.0 transport layer."""
 
+import asyncio
 import json
 
 import httpx
@@ -300,3 +301,259 @@ async def test_call_tool_result_none_from_202():
     result = await transport.call_tool("some_tool")
     assert result is None
     await transport.close()
+
+
+# -- JSON-RPC rate-limit classification (Task 3 / Tests 2, 18, 21, 22) -------
+
+
+@respx.mock
+async def test_jsonrpc_rate_limit_classified_as_rate_limit_error():
+    """200 + JSON-RPC code=-32000 with rate-limit message -> HyperpingRateLimitError."""
+    respx.post(MCP_URL).mock(
+        return_value=httpx.Response(
+            200,
+            json={
+                "jsonrpc": "2.0",
+                "id": 1,
+                "error": {
+                    "code": -32000,
+                    "message": 'Hyperping MCP rate limit exceeded for "initialize" '
+                    "(5/5 per minute). Retry after 32s.",
+                },
+            },
+        )
+    )
+    transport = AsyncMcpTransport(api_key="sk_test", base_url=MCP_URL)
+    transport._initialized = True
+    with pytest.raises(HyperpingRateLimitError) as exc_info:
+        await transport.call_tool("some_tool")
+    assert exc_info.value.retry_after == 32
+    assert exc_info.value.status_code == 200
+    assert "rate limit" in exc_info.value.message.lower()
+    assert exc_info.value.response_body["code"] == -32000
+    await transport.close()
+
+
+@respx.mock
+async def test_jsonrpc_rate_limit_without_retry_after_seconds():
+    """JSON-RPC rate-limit message without parseable seconds -> retry_after=None."""
+    respx.post(MCP_URL).mock(
+        return_value=httpx.Response(
+            200,
+            json={
+                "jsonrpc": "2.0",
+                "id": 1,
+                "error": {
+                    "code": -32000,
+                    "message": "Hyperping MCP rate limit exceeded. Try again later.",
+                },
+            },
+        )
+    )
+    transport = AsyncMcpTransport(api_key="sk_test", base_url=MCP_URL)
+    transport._initialized = True
+    with pytest.raises(HyperpingRateLimitError) as exc_info:
+        await transport.call_tool("some_tool")
+    assert exc_info.value.retry_after is None
+    await transport.close()
+
+
+@respx.mock
+async def test_jsonrpc_non_ratelimit_error_still_generic_api_error():
+    """Non -32000 JSON-RPC error continues to raise plain HyperpingAPIError."""
+    respx.post(MCP_URL).mock(
+        return_value=httpx.Response(
+            200,
+            json={
+                "jsonrpc": "2.0",
+                "id": 1,
+                "error": {"code": -32601, "message": "Method not found"},
+            },
+        )
+    )
+    transport = AsyncMcpTransport(api_key="sk_test", base_url=MCP_URL)
+    transport._initialized = True
+    with pytest.raises(HyperpingAPIError) as exc_info:
+        await transport.call_tool("nonexistent_tool")
+    assert not isinstance(exc_info.value, HyperpingRateLimitError)
+    await transport.close()
+
+
+@respx.mock
+async def test_jsonrpc_32000_but_not_ratelimit_message_is_generic():
+    """code=-32000 without 'rate limit' substring stays a generic HyperpingAPIError."""
+    respx.post(MCP_URL).mock(
+        return_value=httpx.Response(
+            200,
+            json={
+                "jsonrpc": "2.0",
+                "id": 1,
+                "error": {"code": -32000, "message": "Some other server error"},
+            },
+        )
+    )
+    transport = AsyncMcpTransport(api_key="sk_test", base_url=MCP_URL)
+    transport._initialized = True
+    with pytest.raises(HyperpingAPIError) as exc_info:
+        await transport.call_tool("some_tool")
+    assert not isinstance(exc_info.value, HyperpingRateLimitError)
+    await transport.close()
+
+
+# -- TOCTOU init race + idempotency (Task 5 / Tests 7, 9) --------------------
+
+
+@respx.mock
+async def test_concurrent_first_call_single_initialize():
+    """asyncio.gather of two first call_tool calls triggers exactly one initialize."""
+    route = respx.post(MCP_URL).mock(
+        side_effect=[
+            INIT_RESPONSE,
+            NOTIFICATION_ACCEPTED,
+            _tool_response({"ok": True}, req_id=2),
+            _tool_response({"ok": True}, req_id=3),
+        ],
+    )
+    transport = AsyncMcpTransport(api_key="sk_test", base_url=MCP_URL)
+    r1, r2 = await asyncio.gather(
+        transport.call_tool("some_tool"),
+        transport.call_tool("some_tool"),
+    )
+    assert r1 == {"ok": True}
+    assert r2 == {"ok": True}
+    assert route.call_count == 4
+    await transport.close()
+
+
+@respx.mock
+async def test_initialize_is_idempotent():
+    """Calling await transport.initialize() twice does not POST twice."""
+    route = respx.post(MCP_URL).mock(
+        side_effect=[INIT_RESPONSE, NOTIFICATION_ACCEPTED],
+    )
+    transport = AsyncMcpTransport(api_key="sk_test", base_url=MCP_URL)
+    await transport.initialize()
+    await transport.initialize()
+    assert route.call_count == 2
+    await transport.close()
+
+
+# -- Cool-off latch (Task 7 / Tests 11, 13) ----------------------------------
+
+
+@respx.mock
+async def test_initialize_rate_limit_latches_cooloff(monkeypatch):
+    """After a rate-limited initialize, further call_tool calls short-circuit."""
+    fake_now = {"t": 1000.0}
+    monkeypatch.setattr(
+        "hyperping._async_mcp_transport.time.monotonic", lambda: fake_now["t"]
+    )
+
+    rl_response = httpx.Response(
+        200,
+        json={
+            "jsonrpc": "2.0",
+            "id": 1,
+            "error": {
+                "code": -32000,
+                "message": 'Hyperping MCP rate limit exceeded for "initialize" '
+                "(5/5 per minute). Retry after 30s.",
+            },
+        },
+    )
+    route = respx.post(MCP_URL).mock(return_value=rl_response)
+
+    transport = AsyncMcpTransport(api_key="sk_test", base_url=MCP_URL)
+    with pytest.raises(HyperpingRateLimitError):
+        await transport.call_tool("some_tool")
+    assert route.call_count == 1
+
+    with pytest.raises(HyperpingRateLimitError) as exc_info:
+        await transport.call_tool("some_tool")
+    assert route.call_count == 1
+    assert exc_info.value.retry_after is not None
+    assert exc_info.value.retry_after >= 1
+
+    await transport.close()
+
+
+@respx.mock
+async def test_initialize_cooloff_clears_after_deadline(monkeypatch):
+    """Once the cool-off elapses, async initialize is attempted again."""
+    fake_now = {"t": 1000.0}
+    monkeypatch.setattr(
+        "hyperping._async_mcp_transport.time.monotonic", lambda: fake_now["t"]
+    )
+
+    rl_body = {
+        "jsonrpc": "2.0",
+        "id": 1,
+        "error": {
+            "code": -32000,
+            "message": "Hyperping MCP rate limit exceeded. Retry after 10s.",
+        },
+    }
+
+    respx.post(MCP_URL).mock(
+        side_effect=[
+            httpx.Response(200, json=rl_body),
+            INIT_RESPONSE,
+            NOTIFICATION_ACCEPTED,
+            _tool_response({"ok": True}, req_id=2),
+        ]
+    )
+
+    transport = AsyncMcpTransport(api_key="sk_test", base_url=MCP_URL)
+    with pytest.raises(HyperpingRateLimitError):
+        await transport.call_tool("some_tool")
+
+    with pytest.raises(HyperpingRateLimitError):
+        await transport.call_tool("some_tool")
+
+    fake_now["t"] += 100.0
+    result = await transport.call_tool("some_tool")
+    assert result == {"ok": True}
+    await transport.close()
+
+
+# -- Transient retry never catches rate-limit (Task 8 / Test 25) -------------
+
+
+@respx.mock
+async def test_rate_limit_is_not_retried_by_call_tool():
+    """call_tool's transient retry loop must NOT retry HyperpingRateLimitError."""
+    route = respx.post(MCP_URL).mock(
+        return_value=httpx.Response(
+            429, text="Rate limited", headers={"retry-after": "5"},
+        ),
+    )
+    transport = AsyncMcpTransport(api_key="sk_test", base_url=MCP_URL, max_retries=3)
+    transport._initialized = True
+    with pytest.raises(HyperpingRateLimitError):
+        await transport.call_tool("some_tool")
+    assert route.call_count == 1
+    await transport.close()
+
+
+@respx.mock
+async def test_jsonrpc_rate_limit_is_not_retried_by_call_tool():
+    """JSON-RPC -32000 rate-limit (HTTP 200) must NOT trigger retries either."""
+    route = respx.post(MCP_URL).mock(
+        return_value=httpx.Response(
+            200,
+            json={
+                "jsonrpc": "2.0",
+                "id": 1,
+                "error": {
+                    "code": -32000,
+                    "message": "Hyperping MCP rate limit exceeded. Retry after 5s.",
+                },
+            },
+        ),
+    )
+    transport = AsyncMcpTransport(api_key="sk_test", base_url=MCP_URL, max_retries=3)
+    transport._initialized = True
+    with pytest.raises(HyperpingRateLimitError):
+        await transport.call_tool("some_tool")
+    assert route.call_count == 1
+    await transport.close()
diff --git a/tests/unit/test_mcp_client.py b/tests/unit/test_mcp_client.py
index 64ca92d..4a51a94 100644
--- a/tests/unit/test_mcp_client.py
+++ b/tests/unit/test_mcp_client.py
@@ -1,7 +1,14 @@
 """Tests for the high-level MCP client."""
 
+from pathlib import Path
 from unittest.mock import MagicMock
 
+import httpx
+import pytest
+import respx
+
+from hyperping._mcp_transport import MCP_URL
+from hyperping.exceptions import HyperpingRateLimitError
 from hyperping.mcp_client import HyperpingMcpClient
 from hyperping.models._integration_models import Integration
 from hyperping.models._monitor_models import Monitor
@@ -257,3 +264,85 @@ def test_get_integration():
     result = client.get_integration("int1")
     assert isinstance(result, Integration)
     client._transport.call_tool.assert_called_once_with("get_integration", {"uuid": "int1"})
+
+
+# -- ensure_initialized() (Task 9 / Tests 14, 15) ----------------------------
+
+
+def test_ensure_initialized_delegates_to_transport():
+    client = make_client()
+    client.ensure_initialized()
+    client._transport.initialize.assert_called_once_with()
+
+
+def test_ensure_initialized_propagates_rate_limit():
+    client = make_client()
+    client._transport.initialize.side_effect = HyperpingRateLimitError(
+        "rate limited on initialize", retry_after=30, status_code=200,
+    )
+    with pytest.raises(HyperpingRateLimitError) as exc_info:
+        client.ensure_initialized()
+    assert exc_info.value.retry_after == 30
+    assert exc_info.value.status_code == 200
+
+
+@respx.mock
+def test_ensure_initialized_real_transport_is_idempotent():
+    """Calling ensure_initialized twice through the real transport POSTs only once."""
+    route = respx.post(MCP_URL).mock(
+        side_effect=[
+            httpx.Response(
+                200,
+                json={"jsonrpc": "2.0", "id": 1, "result": {"protocolVersion": "2025-03-26"}},
+            ),
+            httpx.Response(202),
+        ],
+    )
+    client = HyperpingMcpClient(api_key="sk_test", base_url=MCP_URL)
+    client.ensure_initialized()
+    client.ensure_initialized()
+    assert route.call_count == 2  # one initialize, one notification
+    client.close()
+
+
+# -- Docs artifact gate (Task 10/11 / Test 30) -------------------------------
+
+
+def _repo_root() -> Path:
+    # tests/unit/test_mcp_client.py -> repo root
+    return Path(__file__).resolve().parents[2]
+
+
+def test_readme_contains_mcp_rate_limits_section():
+    """README must document the MCP rate limit guidance shipped in this change."""
+    readme = (_repo_root() / "README.md").read_text(encoding="utf-8")
+    assert "### MCP rate limits and connection lifecycle" in readme, (
+        "README is missing the 'MCP rate limits and connection lifecycle' section"
+    )
+
+
+def test_changelog_contains_unreleased_entry():
+    """CHANGELOG must contain an [Unreleased] block with Added and Fixed."""
+    changelog = (_repo_root() / "CHANGELOG.md").read_text(encoding="utf-8")
+    assert "\n## [Unreleased]\n" in changelog or changelog.startswith(
+        "## [Unreleased]\n"
+    ) or "## [Unreleased]" in changelog.splitlines(), (
+        "CHANGELOG is missing an '## [Unreleased]' top-level heading"
+    )
+    # Locate the Unreleased section and confirm subheadings appear within it
+    # (before the next ## release heading).
+    lines = changelog.splitlines()
+    start = None
+    for i, line in enumerate(lines):
+        if line.strip() == "## [Unreleased]":
+            start = i
+            break
+    assert start is not None, "CHANGELOG does not contain '## [Unreleased]'"
+    end = len(lines)
+    for j in range(start + 1, len(lines)):
+        if lines[j].startswith("## [") and lines[j].strip() != "## [Unreleased]":
+            end = j
+            break
+    section = "\n".join(lines[start:end])
+    assert "### Added" in section, "Unreleased section is missing '### Added'"
+    assert "### Fixed" in section, "Unreleased section is missing '### Fixed'"
diff --git a/tests/unit/test_mcp_transport.py b/tests/unit/test_mcp_transport.py
index 88fc570..b8881f6 100644
--- a/tests/unit/test_mcp_transport.py
+++ b/tests/unit/test_mcp_transport.py
@@ -1,6 +1,7 @@
 """Tests for the MCP JSON-RPC 2.0 transport layer."""
 
 import json
+import threading
 from unittest.mock import patch
 
 import httpx
@@ -331,3 +332,352 @@ def test_call_tool_notification_200_returns_none():
     result = transport._send_rpc("notifications/test", is_notification=True)
     assert result is None
     transport.close()
+
+
+# -- JSON-RPC rate-limit classification (Task 2 / Tests 1, 17, 19, 20) --------
+
+
+@respx.mock
+def test_jsonrpc_rate_limit_classified_as_rate_limit_error():
+    """200 + JSON-RPC code=-32000 with rate-limit message -> HyperpingRateLimitError."""
+    respx.post(MCP_URL).mock(
+        return_value=httpx.Response(
+            200,
+            json={
+                "jsonrpc": "2.0",
+                "id": 1,
+                "error": {
+                    "code": -32000,
+                    "message": 'Hyperping MCP rate limit exceeded for "initialize" '
+                    "(5/5 per minute). Retry after 32s.",
+                },
+            },
+        )
+    )
+    transport = McpTransport(api_key="sk_test", base_url=MCP_URL)
+    transport._initialized = True  # bypass handshake to exercise _send_rpc directly
+    with pytest.raises(HyperpingRateLimitError) as exc_info:
+        transport.call_tool("some_tool")
+    assert exc_info.value.retry_after == 32
+    assert exc_info.value.status_code == 200
+    assert "rate limit" in exc_info.value.message.lower()
+    assert exc_info.value.response_body["code"] == -32000
+    transport.close()
+
+
+@respx.mock
+def test_jsonrpc_rate_limit_without_retry_after_seconds():
+    """JSON-RPC rate-limit message without parseable seconds -> retry_after=None."""
+    respx.post(MCP_URL).mock(
+        return_value=httpx.Response(
+            200,
+            json={
+                "jsonrpc": "2.0",
+                "id": 1,
+                "error": {
+                    "code": -32000,
+                    "message": "Hyperping MCP rate limit exceeded. Try again later.",
+                },
+            },
+        )
+    )
+    transport = McpTransport(api_key="sk_test", base_url=MCP_URL)
+    transport._initialized = True
+    with pytest.raises(HyperpingRateLimitError) as exc_info:
+        transport.call_tool("some_tool")
+    assert exc_info.value.retry_after is None
+    transport.close()
+
+
+@respx.mock
+def test_jsonrpc_non_ratelimit_error_still_generic_api_error():
+    """Non -32000 JSON-RPC error continues to raise plain HyperpingAPIError."""
+    respx.post(MCP_URL).mock(
+        return_value=httpx.Response(
+            200,
+            json={
+                "jsonrpc": "2.0",
+                "id": 1,
+                "error": {"code": -32601, "message": "Method not found"},
+            },
+        )
+    )
+    transport = McpTransport(api_key="sk_test", base_url=MCP_URL)
+    transport._initialized = True
+    with pytest.raises(HyperpingAPIError) as exc_info:
+        transport.call_tool("nonexistent_tool")
+    assert not isinstance(exc_info.value, HyperpingRateLimitError)
+    transport.close()
+
+
+@respx.mock
+def test_jsonrpc_32000_but_not_ratelimit_message_is_generic():
+    """code=-32000 without 'rate limit' substring stays a generic HyperpingAPIError."""
+    respx.post(MCP_URL).mock(
+        return_value=httpx.Response(
+            200,
+            json={
+                "jsonrpc": "2.0",
+                "id": 1,
+                "error": {"code": -32000, "message": "Some other server error"},
+            },
+        )
+    )
+    transport = McpTransport(api_key="sk_test", base_url=MCP_URL)
+    transport._initialized = True
+    with pytest.raises(HyperpingAPIError) as exc_info:
+        transport.call_tool("some_tool")
+    assert not isinstance(exc_info.value, HyperpingRateLimitError)
+    transport.close()
+
+
+# -- TOCTOU init race + idempotency (Task 4 / Tests 6, 8) --------------------
+
+
+@respx.mock
+def test_concurrent_first_call_single_initialize():
+    """Two concurrent first-callers must trigger exactly one initialize POST.
+
+    respx's side_effect serializes responses by call order; the two tool
+    replies are interchangeable (both ``{"ok": True}``) so any ordering
+    between the two threads is acceptable.
+    """
+    init_route = respx.post(MCP_URL).mock(
+        side_effect=[
+            httpx.Response(
+                200,
+                json={"jsonrpc": "2.0", "id": 1, "result": {"protocolVersion": "2025-03-26"}},
+            ),
+            httpx.Response(202),  # notifications/initialized
+            httpx.Response(
+                200,
+                json={
+                    "jsonrpc": "2.0",
+                    "id": 2,
+                    "result": {"content": [{"type": "text", "text": json.dumps({"ok": True})}]},
+                },
+            ),
+            httpx.Response(
+                200,
+                json={
+                    "jsonrpc": "2.0",
+                    "id": 3,
+                    "result": {"content": [{"type": "text", "text": json.dumps({"ok": True})}]},
+                },
+            ),
+        ]
+    )
+    transport = McpTransport(api_key="sk_test", base_url=MCP_URL)
+    barrier = threading.Barrier(2)
+    results: list[object] = []
+    errors: list[BaseException] = []
+
+    def hit() -> None:
+        barrier.wait()
+        try:
+            results.append(transport.call_tool("some_tool"))
+        except BaseException as exc:  # noqa: BLE001
+            errors.append(exc)
+
+    t1 = threading.Thread(target=hit)
+    t2 = threading.Thread(target=hit)
+    t1.start()
+    t2.start()
+    t1.join()
+    t2.join()
+
+    assert errors == [], f"unexpected errors from threads: {errors!r}"
+    # Expected: exactly 4 POSTs total -- one initialize, one notification, two tool calls.
+    assert init_route.call_count == 4
+    assert results == [{"ok": True}, {"ok": True}]
+    transport.close()
+
+
+@respx.mock
+def test_initialize_is_idempotent():
+    """Calling initialize() twice does not POST twice."""
+    route = respx.post(MCP_URL).mock(
+        side_effect=[
+            httpx.Response(
+                200,
+                json={"jsonrpc": "2.0", "id": 1, "result": {"protocolVersion": "2025-03-26"}},
+            ),
+            httpx.Response(202),
+        ]
+    )
+    transport = McpTransport(api_key="sk_test", base_url=MCP_URL)
+    transport.initialize()
+    transport.initialize()  # second call must be a no-op
+    assert route.call_count == 2  # initialize + notification, not 4
+    transport.close()
+
+
+# -- Cool-off latch (Task 6 / Tests 10, 12) ----------------------------------
+
+
+@respx.mock
+def test_initialize_rate_limit_latches_cooloff(monkeypatch):
+    """After a rate-limited initialize, further call_tool calls short-circuit."""
+    fake_now = {"t": 1000.0}
+    monkeypatch.setattr(
+        "hyperping._mcp_transport.time.monotonic", lambda: fake_now["t"]
+    )
+
+    rl_response = httpx.Response(
+        200,
+        json={
+            "jsonrpc": "2.0",
+            "id": 1,
+            "error": {
+                "code": -32000,
+                "message": 'Hyperping MCP rate limit exceeded for "initialize" '
+                "(5/5 per minute). Retry after 30s.",
+            },
+        },
+    )
+    route = respx.post(MCP_URL).mock(return_value=rl_response)
+
+    transport = McpTransport(api_key="sk_test", base_url=MCP_URL)
+    with pytest.raises(HyperpingRateLimitError):
+        transport.call_tool("some_tool")  # triggers initialize, gets latched
+    assert route.call_count == 1  # only the initialize POST was made
+
+    # Subsequent call_tool must not hit the network.
+    with pytest.raises(HyperpingRateLimitError) as exc_info:
+        transport.call_tool("some_tool")
+    assert route.call_count == 1  # still 1 -- no further HTTP requests
+    assert exc_info.value.retry_after is not None
+    assert exc_info.value.retry_after >= 1
+
+    transport.close()
+
+
+@respx.mock
+def test_initialize_cooloff_clears_after_deadline(monkeypatch):
+    """Once the cool-off elapses, initialize is attempted again."""
+    fake_now = {"t": 1000.0}
+    monkeypatch.setattr(
+        "hyperping._mcp_transport.time.monotonic", lambda: fake_now["t"]
+    )
+
+    rl_body = {
+        "jsonrpc": "2.0",
+        "id": 1,
+        "error": {
+            "code": -32000,
+            "message": "Hyperping MCP rate limit exceeded. Retry after 10s.",
+        },
+    }
+    ok_init = httpx.Response(
+        200,
+        json={"jsonrpc": "2.0", "id": 1, "result": {"protocolVersion": "2025-03-26"}},
+    )
+    ok_tool = httpx.Response(
+        200,
+        json={
+            "jsonrpc": "2.0",
+            "id": 2,
+            "result": {"content": [{"type": "text", "text": json.dumps({"ok": True})}]},
+        },
+    )
+
+    respx.post(MCP_URL).mock(
+        side_effect=[
+            httpx.Response(200, json=rl_body),  # first initialize: rate-limited
+            ok_init,  # second initialize: success
+            httpx.Response(202),  # notifications/initialized
+            ok_tool,  # tool call
+        ]
+    )
+
+    transport = McpTransport(api_key="sk_test", base_url=MCP_URL)
+    with pytest.raises(HyperpingRateLimitError):
+        transport.call_tool("some_tool")
+
+    # Still latched.
+    with pytest.raises(HyperpingRateLimitError):
+        transport.call_tool("some_tool")
+
+    # Advance past the deadline.
+    fake_now["t"] += 100.0
+    result = transport.call_tool("some_tool")
+    assert result == {"ok": True}
+    transport.close()
+
+
+# -- Transient retry never catches rate-limit (Task 8 / Tests 23, 24) --------
+
+
+@respx.mock
+def test_rate_limit_is_not_retried_by_call_tool():
+    """call_tool's transient retry loop must NOT retry HyperpingRateLimitError."""
+    route = respx.post(MCP_URL).mock(
+        return_value=httpx.Response(
+            429, text="Rate limited", headers={"retry-after": "5"},
+        ),
+    )
+    transport = McpTransport(api_key="sk_test", base_url=MCP_URL, max_retries=3)
+    transport._initialized = True
+    with pytest.raises(HyperpingRateLimitError):
+        transport.call_tool("some_tool")
+    assert route.call_count == 1  # NOT 4 -- no retry attempts
+    transport.close()
+
+
+@respx.mock
+def test_jsonrpc_rate_limit_is_not_retried_by_call_tool():
+    """JSON-RPC -32000 rate-limit (HTTP 200) must NOT trigger retries either."""
+    route = respx.post(MCP_URL).mock(
+        return_value=httpx.Response(
+            200,
+            json={
+                "jsonrpc": "2.0",
+                "id": 1,
+                "error": {
+                    "code": -32000,
+                    "message": "Hyperping MCP rate limit exceeded. Retry after 5s.",
+                },
+            },
+        ),
+    )
+    transport = McpTransport(api_key="sk_test", base_url=MCP_URL, max_retries=3)
+    transport._initialized = True
+    with pytest.raises(HyperpingRateLimitError):
+        transport.call_tool("some_tool")
+    assert route.call_count == 1
+    transport.close()
+
+
+# -- User repro: six fresh clients (Test 3) ----------------------------------
+
+
+@respx.mock
+def test_six_fresh_clients_under_jsonrpc_rate_limit_all_fail_typed():
+    """User's paste-ready repro: every iteration raises typed rate-limit error."""
+    respx.post(MCP_URL).mock(
+        return_value=httpx.Response(
+            200,
+            json={
+                "jsonrpc": "2.0",
+                "id": 1,
+                "error": {
+                    "code": -32000,
+                    "message": 'Hyperping MCP rate limit exceeded for "initialize" '
+                    "(5/5 per minute). Retry after 32s.",
+                },
+            },
+        )
+    )
+    captured: list[HyperpingRateLimitError] = []
+    for _ in range(6):
+        transport = McpTransport(api_key="sk_test", base_url=MCP_URL)
+        try:
+            transport.call_tool("list_monitors", {"status": "ssl_expiring"})
+        except HyperpingRateLimitError as exc:
+            captured.append(exc)
+        finally:
+            transport.close()
+    assert len(captured) == 6
+    for exc in captured:
+        assert exc.retry_after == 32
+        assert exc.status_code == 200

From edd6a5b0ec5f9ba2b7fd332df6cbf82a5cc43d47 Mon Sep 17 00:00:00 2001
From: Khaled Salhab <khaled.salhab@develeap.com>
Date: Thu, 21 May 2026 07:13:55 +0300
Subject: [PATCH 5/5] fix(mcp): harden rate-limit handling per review

Addresses the issues surfaced by the unbiased review of #25:

- Tighten the rate-limit marker to "rate limit exceeded" so future server
  messages that merely mention "rate limit" cannot be misclassified.
- Broaden the Retry-After parser to accept "Retry-After: Ns" header-style,
  "retry after N seconds" wordy units, mixed case, and no-units variants.
  Parametrized tests cover the variants.
- Persist the originating status_code (200 vs 429) on the cool-off latch so
  the short-circuit no longer falsely reports 200 for HTTP 429 sources.
- Use math.ceil(remaining) for the cool-off retry_after instead of
  int(remaining)+1; eliminates the systematic +1s over-report.
- Treat retry_after=0 as "no latch" (server says retry now) instead of
  falling back to the 30s default.
- Add a lockless fast path on initialize() so post-handshake call_tool
  invocations do not acquire _init_lock on every call.
- Classify JSON-RPC -32000 rate-limit signals returned on the
  notifications/initialized leg too; previously they were silently swallowed
  by the early notification short-circuit.
- Add response_body to the HTTP 429 path for symmetry with the JSON-RPC path.
- Strengthen tests: concurrency test inspects request bodies for the
  initialize method count, idempotency test asserts call_count before/after
  the second initialize(), cool-off-clears test asserts route.call_count at
  each phase so a regression that hit the network during the latch would fail.
- Add tests for status_code preservation, math.ceil semantics, retry_after=0
  no-latch behavior, the tightened marker, and the notification-leg case.
- Simplify the CHANGELOG docs gate to a single regex assertion.
- Expand HyperpingRateLimitError, ensure_initialized(), and README copy to
  accurately reflect the new behavior; cite the Hyperping MCP docs for the
  "stateless over HTTP" claim and call out that the latch is per-process.

pytest: 482 passed; ruff: clean; mypy --strict: clean; coverage: 95.76%.
---
 CHANGELOG.md                           |  13 +-
 README.md                              |  24 ++-
 src/hyperping/_async_mcp_client.py     |   9 +-
 src/hyperping/_async_mcp_transport.py  | 122 ++++++++---
 src/hyperping/_mcp_transport.py        | 122 ++++++++---
 src/hyperping/exceptions.py            |  19 +-
 src/hyperping/mcp_client.py            |  10 +-
 tests/unit/test_async_mcp_transport.py | 253 +++++++++++++++++++++-
 tests/unit/test_mcp_client.py          |  28 +--
 tests/unit/test_mcp_transport.py       | 286 ++++++++++++++++++++++++-
 10 files changed, 780 insertions(+), 106 deletions(-)

diff --git a/CHANGELOG.md b/CHANGELOG.md
index 52eee80..355cdf0 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -29,7 +29,18 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
   requests that would burn more slots from the bucket.
 - TOCTOU race in lazy `initialize` where two concurrent first calls on the same
   `HyperpingMcpClient` could each POST `initialize`. The handshake is now
-  performed under a dedicated lock with a double-checked flag.
+  performed under a dedicated lock with a double-checked flag, including a
+  lockless fast path so post-handshake `call_tool` does not contend on it.
+- Cool-off short-circuit now preserves the originating status code (200 for
+  JSON-RPC `-32000`, 429 for HTTP 429) so callers can distinguish buckets, and
+  `retry_after` uses `math.ceil` to avoid over-reporting by one second.
+- JSON-RPC rate-limit signals returned on the `notifications/initialized` leg
+  are now classified as `HyperpingRateLimitError` (previously they were
+  silently treated as a successful notification).
+- Rate-limit detection requires the message to contain `"rate limit exceeded"`
+  (the observed phrasing) to avoid false positives on unrelated server messages
+  that happen to mention `"rate limit"`. The `Retry-After` parser now also
+  accepts `Retry-After:` and `retry after N seconds` variants.
 
 ## [1.6.0] - 2026-05-06
 
diff --git a/README.md b/README.md
index cb2f1cc..38091ab 100644
--- a/README.md
+++ b/README.md
@@ -195,13 +195,16 @@ plain dicts/lists; use the exported Pydantic models (e.g., `OnCallSchedule`,
 
 ### MCP rate limits and connection lifecycle
 
-The Hyperping MCP server (`https://api.hyperping.io/v1/mcp`) is stateless over HTTP
+The Hyperping MCP server (`https://api.hyperping.io/v1/mcp`) is
+[documented by Hyperping as stateless over HTTP](https://hyperping.com/mcp)
 and rate-limits per API key. The publicly documented limit is 300 requests per
-minute shared with the REST API, but the server also enforces a separate, low cap
-on the `initialize` handshake (observed around 5/minute). Because every new
-`HyperpingMcpClient` instance must perform the MCP `initialize` handshake on its
-first call, instantiating the client in a hot path or running several short-lived
-processes against one key will trip this cap.
+minute shared with the REST API
+([rate-limit docs](https://hyperping.com/docs/monitoring/api-rate-limits)), but
+the server also enforces a separate, undocumented cap on the `initialize`
+handshake (observed around 5/minute). Because every new `HyperpingMcpClient`
+instance must perform the MCP `initialize` handshake on its first call,
+instantiating the client in a hot path or running several short-lived processes
+against one key will trip this cap.
 
 Operational guidance:
 
@@ -212,7 +215,9 @@ Operational guidance:
   arrive two ways: as HTTP 429 (with a standard `Retry-After` header) and as a
   JSON-RPC server error (`code: -32000`, HTTP 200) on `initialize`. Both surface as
   `HyperpingRateLimitError` with `retry_after` parsed from whichever signal was
-  used. The `status_code` attribute is `429` or `200` respectively.
+  used. The `status_code` attribute is `429` or `200`, matching the underlying
+  signal; cool-off short-circuits preserve the originating status code so callers
+  can disambiguate the two buckets.
 - **Use `ensure_initialized()` for startup health checks.** Calling it once on
   service boot lets you fail fast if the key is already at the `initialize` cap,
   instead of failing on the first business call.
@@ -223,7 +228,10 @@ Operational guidance:
 - **After a rate-limit on `initialize`, the SDK latches a cool-off** so that
   subsequent `call_tool` invocations on the same client fail fast with
   `HyperpingRateLimitError` (no extra HTTP traffic) until `retry_after` elapses.
-  This prevents accidentally burning more slots from the bucket.
+  This prevents accidentally burning more slots from the bucket. The latch is
+  per-`HyperpingMcpClient` instance and per-process; it does not coordinate
+  across separate Python processes sharing the same API key, so multi-process
+  setups still need the workload-separation advice above.
 
 ```python
 from hyperping import HyperpingMcpClient, HyperpingRateLimitError
diff --git a/src/hyperping/_async_mcp_client.py b/src/hyperping/_async_mcp_client.py
index df98061..720a130 100644
--- a/src/hyperping/_async_mcp_client.py
+++ b/src/hyperping/_async_mcp_client.py
@@ -68,12 +68,19 @@ async def ensure_initialized(self) -> None:
 
         Async counterpart to
         :meth:`hyperping.mcp_client.HyperpingMcpClient.ensure_initialized`.
+        Idempotent.
 
         Raises:
             HyperpingRateLimitError: If the server rate-limits ``initialize``,
                 either via HTTP 429 or via the JSON-RPC ``-32000`` rate-limit
                 payload. Inspect ``.retry_after`` to back off.
-            HyperpingAuthError: If the API key is invalid.
+            HyperpingAuthError: If the API key is invalid (HTTP 401/403).
+            HyperpingNotFoundError: If the MCP endpoint URL is wrong
+                (HTTP 404).
+            HyperpingValidationError: If the server rejects the handshake
+                payload (HTTP 400/422; unusual on initialize).
+            HyperpingAPIError: Any other transport-level error (HTTP 5xx,
+                malformed body, etc.).
         """
         await self._transport.initialize()
 
diff --git a/src/hyperping/_async_mcp_transport.py b/src/hyperping/_async_mcp_transport.py
index 05568fb..7f578ef 100644
--- a/src/hyperping/_async_mcp_transport.py
+++ b/src/hyperping/_async_mcp_transport.py
@@ -4,6 +4,7 @@
 
 import asyncio
 import json
+import math
 import re
 import time
 from typing import Any
@@ -23,8 +24,18 @@
 
 _PROTOCOL_VERSION = "2025-03-26"
 
-_MCP_RATE_LIMIT_MARKER = "rate limit"
-_MCP_RATE_LIMIT_RETRY_AFTER_RE = re.compile(r"[Rr]etry after\s+(\d+)\s*s")
+# Tight marker: the server's observed phrasing is "rate limit exceeded ...".
+# Bare "rate limit" would risk classifying messages like "rate limit
+# configuration invalid" as a rate-limit error.
+_MCP_RATE_LIMIT_MARKER = "rate limit exceeded"
+# Accept "Retry after Ns", "Retry-After: Ns", "retry after 30 seconds", etc.
+# Captures only the integer; sub-second values are floored.
+_MCP_RATE_LIMIT_RETRY_AFTER_RE = re.compile(
+    r"retry[\s\-]after[:\s]+(\d+)",
+    re.IGNORECASE,
+)
+# Default cool-off when the server fails to advertise one.
+_COOLOFF_DEFAULT_SECONDS = 30
 
 
 class AsyncMcpTransport:
@@ -56,8 +67,15 @@ def __init__(
         self._initialized = False
         self._request_id = 0
         self._lock = asyncio.Lock()
+        # Separate lock for the handshake so request-id increment and the
+        # initialize() critical section don't contend.
         self._init_lock = asyncio.Lock()
+        # Monotonic deadline (process-local). 0.0 means no latch.
         self._init_blocked_until: float = 0.0
+        # Status code of the original rate-limit response that armed the
+        # latch, propagated through short-circuit raises so callers can tell
+        # whether they hit HTTP 429 or HTTP 200 + JSON-RPC -32000.
+        self._init_blocked_status_code: int = 200
         self._init_result: dict[str, Any] = {}
         self._max_retries = max_retries
 
@@ -102,6 +120,7 @@ async def _send_rpc(
                 "Rate limit exceeded",
                 retry_after=retry_after,
                 status_code=429,
+                response_body={"raw": resp.text[:500]},
             )
         if resp.status_code in (400, 422):
             raise HyperpingValidationError(
@@ -114,37 +133,54 @@ async def _send_rpc(
                 status_code=resp.status_code,
                 response_body={"raw": resp.text[:500]},
             )
+
+        # HTTP 200. Parse the body so we classify JSON-RPC errors (including
+        # rate-limit signals) on notification responses too -- the server can
+        # return 200 + JSON-RPC error on a "notifications/initialized" leg.
+        try:
+            data = resp.json()
+        except (json.JSONDecodeError, ValueError):
+            if is_notification:
+                return None
+            raise HyperpingAPIError(
+                "MCP server returned 200 with non-JSON body",
+                status_code=200,
+                response_body={"raw": resp.text[:500]},
+            ) from None
+
+        if isinstance(data, dict) and "error" in data:
+            self._raise_for_jsonrpc_error(data["error"], resp.status_code)
+
         if is_notification:
             return None
+        return data  # type: ignore[no-any-return]
 
-        data = resp.json()
-        if "error" in data:
-            err = data["error"]
-            if (
-                isinstance(err, dict)
-                and err.get("code") == -32000
-                and isinstance(err.get("message"), str)
-                and _MCP_RATE_LIMIT_MARKER in err["message"].lower()
-            ):
-                rl_retry_after: int | None = None
-                match = _MCP_RATE_LIMIT_RETRY_AFTER_RE.search(err["message"])
-                if match:
-                    try:
-                        rl_retry_after = int(match.group(1))
-                    except ValueError:  # defensive; regex guarantees digits
-                        rl_retry_after = None
-                raise HyperpingRateLimitError(
-                    err["message"],
-                    retry_after=rl_retry_after,
-                    status_code=resp.status_code,
-                    response_body=err,
-                )
-            raise HyperpingAPIError(
-                f"MCP error {err.get('code', '?')}: {err.get('message', 'unknown')}",
-                status_code=resp.status_code,
-                response_body=err,
+    @staticmethod
+    def _raise_for_jsonrpc_error(err: Any, status_code: int) -> None:
+        """Map a JSON-RPC ``error`` payload to a typed exception and raise it."""
+        if (
+            isinstance(err, dict)
+            and err.get("code") == -32000
+            and isinstance(err.get("message"), str)
+            and _MCP_RATE_LIMIT_MARKER in err["message"].lower()
+        ):
+            rl_retry_after: int | None = None
+            match = _MCP_RATE_LIMIT_RETRY_AFTER_RE.search(err["message"])
+            if match:
+                rl_retry_after = int(match.group(1))
+            raise HyperpingRateLimitError(
+                err["message"],
+                retry_after=rl_retry_after,
+                status_code=status_code,
+                response_body=err if isinstance(err, dict) else None,
             )
-        return data  # type: ignore[no-any-return]
+        code = err.get("code", "?") if isinstance(err, dict) else "?"
+        message = err.get("message", "unknown") if isinstance(err, dict) else str(err)
+        raise HyperpingAPIError(
+            f"MCP error {code}: {message}",
+            status_code=status_code,
+            response_body=err if isinstance(err, dict) else None,
+        )
 
     async def initialize(self) -> dict[str, Any]:
         """Async idempotent and concurrency-safe MCP handshake.
@@ -153,7 +189,16 @@ async def initialize(self) -> dict[str, Any]:
         first successful handshake. While an ``initialize`` cool-off latch is
         active, raises :class:`HyperpingRateLimitError` without issuing any
         HTTP request.
+
+        The cool-off latch is per-transport-instance and per-process. It does
+        not coordinate across separate Python processes sharing the same API
+        key; each process keeps its own latch.
         """
+        # Fast path: avoid lock acquisition on every call after the handshake
+        # has succeeded. ``_initialized`` is only assigned True under the lock
+        # after both legs of the handshake, so a True read here is safe.
+        if self._initialized:
+            return self._init_result
         async with self._init_lock:
             if self._initialized:
                 return self._init_result
@@ -161,12 +206,14 @@ async def initialize(self) -> dict[str, Any]:
 
     async def _initialize_locked(self) -> dict[str, Any]:
         """Perform the handshake. Assumes ``self._init_lock`` is held."""
+        # ``time.monotonic`` is used deliberately over ``time.time`` so the
+        # latch is immune to wall-clock jumps (NTP adjustments, suspend/resume).
         remaining = self._init_blocked_until - time.monotonic()
         if remaining > 0:
             raise HyperpingRateLimitError(
                 "MCP initialize rate limit cool-off active; retry later",
-                retry_after=int(remaining) + 1,
-                status_code=200,
+                retry_after=max(math.ceil(remaining), 1),
+                status_code=self._init_blocked_status_code,
             )
         try:
             result = await self._send_rpc(
@@ -179,12 +226,21 @@ async def _initialize_locked(self) -> dict[str, Any]:
             )
             await self._send_rpc("notifications/initialized", is_notification=True)
         except HyperpingRateLimitError as exc:
-            wait = exc.retry_after if exc.retry_after and exc.retry_after > 0 else 30
+            # retry_after=None -> default cool-off; retry_after=0 -> no latch
+            # (the server is telling us we may retry immediately); positive
+            # values are honoured verbatim.
+            if exc.retry_after is None:
+                wait = _COOLOFF_DEFAULT_SECONDS
+            else:
+                wait = max(int(exc.retry_after), 0)
             self._init_blocked_until = time.monotonic() + wait
+            self._init_blocked_status_code = exc.status_code or 200
             raise
         self._init_result = result.get("result", {}) if result else {}
-        self._initialized = True
         self._init_blocked_until = 0.0
+        # Set last so the fast path in initialize() never returns a stale
+        # ``_init_result``.
+        self._initialized = True
         return self._init_result
 
     async def call_tool(
diff --git a/src/hyperping/_mcp_transport.py b/src/hyperping/_mcp_transport.py
index c36333e..0b48be7 100644
--- a/src/hyperping/_mcp_transport.py
+++ b/src/hyperping/_mcp_transport.py
@@ -3,6 +3,7 @@
 from __future__ import annotations
 
 import json
+import math
 import re
 import threading
 import time
@@ -23,8 +24,18 @@
 
 _PROTOCOL_VERSION = "2025-03-26"
 
-_MCP_RATE_LIMIT_MARKER = "rate limit"
-_MCP_RATE_LIMIT_RETRY_AFTER_RE = re.compile(r"[Rr]etry after\s+(\d+)\s*s")
+# Tight marker: the server's observed phrasing is "rate limit exceeded ...".
+# Bare "rate limit" would risk classifying messages like "rate limit
+# configuration invalid" as a rate-limit error.
+_MCP_RATE_LIMIT_MARKER = "rate limit exceeded"
+# Accept "Retry after Ns", "Retry-After: Ns", "retry after 30 seconds", etc.
+# Captures only the integer; sub-second values are floored.
+_MCP_RATE_LIMIT_RETRY_AFTER_RE = re.compile(
+    r"retry[\s\-]after[:\s]+(\d+)",
+    re.IGNORECASE,
+)
+# Default cool-off when the server fails to advertise one.
+_COOLOFF_DEFAULT_SECONDS = 30
 
 
 class McpTransport:
@@ -56,8 +67,15 @@ def __init__(
         self._initialized = False
         self._request_id = 0
         self._lock = threading.Lock()
+        # Separate lock for the handshake so request-id increment and the
+        # initialize() critical section don't contend.
         self._init_lock = threading.Lock()
+        # Monotonic deadline (process-local). 0.0 means no latch.
         self._init_blocked_until: float = 0.0
+        # Status code of the original rate-limit response that armed the
+        # latch, propagated through short-circuit raises so callers can tell
+        # whether they hit HTTP 429 or HTTP 200 + JSON-RPC -32000.
+        self._init_blocked_status_code: int = 200
         self._init_result: dict[str, Any] = {}
         self._max_retries = max_retries
 
@@ -102,6 +120,7 @@ def _send_rpc(
                 "Rate limit exceeded",
                 retry_after=retry_after,
                 status_code=429,
+                response_body={"raw": resp.text[:500]},
             )
         if resp.status_code in (400, 422):
             raise HyperpingValidationError(
@@ -114,37 +133,54 @@ def _send_rpc(
                 status_code=resp.status_code,
                 response_body={"raw": resp.text[:500]},
             )
+
+        # HTTP 200. Parse the body so we classify JSON-RPC errors (including
+        # rate-limit signals) on notification responses too -- the server can
+        # return 200 + JSON-RPC error on a "notifications/initialized" leg.
+        try:
+            data = resp.json()
+        except (json.JSONDecodeError, ValueError):
+            if is_notification:
+                return None
+            raise HyperpingAPIError(
+                "MCP server returned 200 with non-JSON body",
+                status_code=200,
+                response_body={"raw": resp.text[:500]},
+            ) from None
+
+        if isinstance(data, dict) and "error" in data:
+            self._raise_for_jsonrpc_error(data["error"], resp.status_code)
+
         if is_notification:
             return None
+        return data  # type: ignore[no-any-return]
 
-        data = resp.json()
-        if "error" in data:
-            err = data["error"]
-            if (
-                isinstance(err, dict)
-                and err.get("code") == -32000
-                and isinstance(err.get("message"), str)
-                and _MCP_RATE_LIMIT_MARKER in err["message"].lower()
-            ):
-                rl_retry_after: int | None = None
-                match = _MCP_RATE_LIMIT_RETRY_AFTER_RE.search(err["message"])
-                if match:
-                    try:
-                        rl_retry_after = int(match.group(1))
-                    except ValueError:  # defensive; regex guarantees digits
-                        rl_retry_after = None
-                raise HyperpingRateLimitError(
-                    err["message"],
-                    retry_after=rl_retry_after,
-                    status_code=resp.status_code,
-                    response_body=err,
-                )
-            raise HyperpingAPIError(
-                f"MCP error {err.get('code', '?')}: {err.get('message', 'unknown')}",
-                status_code=resp.status_code,
-                response_body=err,
+    @staticmethod
+    def _raise_for_jsonrpc_error(err: Any, status_code: int) -> None:
+        """Map a JSON-RPC ``error`` payload to a typed exception and raise it."""
+        if (
+            isinstance(err, dict)
+            and err.get("code") == -32000
+            and isinstance(err.get("message"), str)
+            and _MCP_RATE_LIMIT_MARKER in err["message"].lower()
+        ):
+            rl_retry_after: int | None = None
+            match = _MCP_RATE_LIMIT_RETRY_AFTER_RE.search(err["message"])
+            if match:
+                rl_retry_after = int(match.group(1))
+            raise HyperpingRateLimitError(
+                err["message"],
+                retry_after=rl_retry_after,
+                status_code=status_code,
+                response_body=err if isinstance(err, dict) else None,
             )
-        return data  # type: ignore[no-any-return]
+        code = err.get("code", "?") if isinstance(err, dict) else "?"
+        message = err.get("message", "unknown") if isinstance(err, dict) else str(err)
+        raise HyperpingAPIError(
+            f"MCP error {code}: {message}",
+            status_code=status_code,
+            response_body=err if isinstance(err, dict) else None,
+        )
 
     def initialize(self) -> dict[str, Any]:
         """Perform MCP handshake if not yet performed. Idempotent and thread-safe.
@@ -153,7 +189,16 @@ def initialize(self) -> dict[str, Any]:
         first successful handshake. While an ``initialize`` cool-off latch is
         active, raises :class:`HyperpingRateLimitError` without issuing any
         HTTP request.
+
+        The cool-off latch is per-transport-instance and per-process. It does
+        not coordinate across separate Python processes sharing the same API
+        key; each process keeps its own latch.
         """
+        # Fast path: avoid lock acquisition on every call after the handshake
+        # has succeeded. ``_initialized`` is only assigned True under the lock
+        # after both legs of the handshake, so a True read here is safe.
+        if self._initialized:
+            return self._init_result
         with self._init_lock:
             if self._initialized:
                 return self._init_result
@@ -161,12 +206,14 @@ def initialize(self) -> dict[str, Any]:
 
     def _initialize_locked(self) -> dict[str, Any]:
         """Perform the handshake. Assumes ``self._init_lock`` is held."""
+        # ``time.monotonic`` is used deliberately over ``time.time`` so the
+        # latch is immune to wall-clock jumps (NTP adjustments, suspend/resume).
         remaining = self._init_blocked_until - time.monotonic()
         if remaining > 0:
             raise HyperpingRateLimitError(
                 "MCP initialize rate limit cool-off active; retry later",
-                retry_after=int(remaining) + 1,
-                status_code=200,
+                retry_after=max(math.ceil(remaining), 1),
+                status_code=self._init_blocked_status_code,
             )
         try:
             result = self._send_rpc(
@@ -179,12 +226,21 @@ def _initialize_locked(self) -> dict[str, Any]:
             )
             self._send_rpc("notifications/initialized", is_notification=True)
         except HyperpingRateLimitError as exc:
-            wait = exc.retry_after if exc.retry_after and exc.retry_after > 0 else 30
+            # retry_after=None -> default cool-off; retry_after=0 -> no latch
+            # (the server is telling us we may retry immediately); positive
+            # values are honoured verbatim.
+            if exc.retry_after is None:
+                wait = _COOLOFF_DEFAULT_SECONDS
+            else:
+                wait = max(int(exc.retry_after), 0)
             self._init_blocked_until = time.monotonic() + wait
+            self._init_blocked_status_code = exc.status_code or 200
             raise
         self._init_result = result.get("result", {}) if result else {}
-        self._initialized = True
         self._init_blocked_until = 0.0
+        # Set last so the fast path in initialize() never returns a stale
+        # ``_init_result``.
+        self._initialized = True
         return self._init_result
 
     def call_tool(
diff --git a/src/hyperping/exceptions.py b/src/hyperping/exceptions.py
index e8fef62..eb34cff 100644
--- a/src/hyperping/exceptions.py
+++ b/src/hyperping/exceptions.py
@@ -50,12 +50,25 @@ class HyperpingNotFoundError(HyperpingAPIError):
 
 
 class HyperpingRateLimitError(HyperpingAPIError):
-    """Raised when the API rate limit is exceeded (HTTP 429).
+    """Raised when an API rate limit is exceeded.
+
+    The REST API and the MCP server both signal rate-limit using this
+    exception. The signal can arrive two ways:
+
+    - HTTP 429 with a standard ``Retry-After`` header (REST and MCP).
+    - HTTP 200 with a JSON-RPC ``-32000`` error whose message contains the
+      string ``"rate limit exceeded"`` and (optionally) ``"Retry after Ns"``
+      (MCP ``initialize``-bucket signal).
+
+    ``status_code`` reflects whichever signal was used (429 or 200). When the
+    MCP cool-off latch short-circuits a subsequent ``initialize`` attempt,
+    ``status_code`` is the status of the original rate-limit response.
 
     Args:
         message: Human-readable error description.
-        retry_after: Seconds to wait before retrying, from the ``Retry-After``
-            response header. ``None`` if the header was absent.
+        retry_after: Seconds to wait before retrying, parsed from the
+            ``Retry-After`` header or the JSON-RPC message body. ``None`` if
+            no value was advertised.
         **kwargs: Forwarded to :class:`HyperpingAPIError`.
     """
 
diff --git a/src/hyperping/mcp_client.py b/src/hyperping/mcp_client.py
index 3341e6d..744a60f 100644
--- a/src/hyperping/mcp_client.py
+++ b/src/hyperping/mcp_client.py
@@ -69,11 +69,19 @@ def ensure_initialized(self) -> None:
         :class:`HyperpingRateLimitError` so you can decide whether to start
         the rest of your service. Subsequent tool calls reuse the handshake.
 
+        Idempotent: calling it more than once is a no-op after success.
+
         Raises:
             HyperpingRateLimitError: If the server rate-limits ``initialize``,
                 either via HTTP 429 or via the JSON-RPC ``-32000`` rate-limit
                 payload. Inspect ``.retry_after`` to back off.
-            HyperpingAuthError: If the API key is invalid.
+            HyperpingAuthError: If the API key is invalid (HTTP 401/403).
+            HyperpingNotFoundError: If the MCP endpoint URL is wrong
+                (HTTP 404).
+            HyperpingValidationError: If the server rejects the handshake
+                payload (HTTP 400/422; unusual on initialize).
+            HyperpingAPIError: Any other transport-level error (HTTP 5xx,
+                malformed body, etc.).
         """
         self._transport.initialize()
 
diff --git a/tests/unit/test_async_mcp_transport.py b/tests/unit/test_async_mcp_transport.py
index 9165a9a..55124aa 100644
--- a/tests/unit/test_async_mcp_transport.py
+++ b/tests/unit/test_async_mcp_transport.py
@@ -422,6 +422,13 @@ async def test_concurrent_first_call_single_initialize():
     assert r1 == {"ok": True}
     assert r2 == {"ok": True}
     assert route.call_count == 4
+    # Inspect bodies to ensure exactly ONE initialize POST. A TOCTOU
+    # regression would send two and the total-count check alone wouldn't
+    # catch it.
+    methods = [json.loads(call.request.content)["method"] for call in route.calls]
+    assert methods.count("initialize") == 1, methods
+    assert methods.count("notifications/initialized") == 1, methods
+    assert methods.count("tools/call") == 2, methods
     await transport.close()
 
 
@@ -433,8 +440,14 @@ async def test_initialize_is_idempotent():
     )
     transport = AsyncMcpTransport(api_key="sk_test", base_url=MCP_URL)
     await transport.initialize()
-    await transport.initialize()
-    assert route.call_count == 2
+    count_after_first = route.call_count
+    assert count_after_first == 2
+    await transport.initialize()  # second call must be a pure no-op
+    assert route.call_count == count_after_first, (
+        "second initialize() must not issue any HTTP request"
+    )
+    methods = [json.loads(call.request.content)["method"] for call in route.calls]
+    assert methods == ["initialize", "notifications/initialized"]
     await transport.close()
 
 
@@ -494,7 +507,7 @@ async def test_initialize_cooloff_clears_after_deadline(monkeypatch):
         },
     }
 
-    respx.post(MCP_URL).mock(
+    route = respx.post(MCP_URL).mock(
         side_effect=[
             httpx.Response(200, json=rl_body),
             INIT_RESPONSE,
@@ -506,13 +519,18 @@ async def test_initialize_cooloff_clears_after_deadline(monkeypatch):
     transport = AsyncMcpTransport(api_key="sk_test", base_url=MCP_URL)
     with pytest.raises(HyperpingRateLimitError):
         await transport.call_tool("some_tool")
+    assert route.call_count == 1, "first call_tool must POST initialize exactly once"
 
     with pytest.raises(HyperpingRateLimitError):
         await transport.call_tool("some_tool")
+    assert route.call_count == 1, "latched call_tool must not POST"
 
     fake_now["t"] += 100.0
     result = await transport.call_tool("some_tool")
     assert result == {"ok": True}
+    assert route.call_count == 4, (
+        "post-deadline call_tool must POST initialize + notification + tool"
+    )
     await transport.close()
 
 
@@ -557,3 +575,232 @@ async def test_jsonrpc_rate_limit_is_not_retried_by_call_tool():
         await transport.call_tool("some_tool")
     assert route.call_count == 1
     await transport.close()
+
+
+# -- Retry-After parser robustness (MAJOR-7) ---------------------------------
+
+
+@pytest.mark.parametrize(
+    "message, expected",
+    [
+        ('Hyperping MCP rate limit exceeded. Retry after 32s.', 32),
+        ('Rate limit exceeded for "initialize". Retry-After: 30s', 30),
+        ('Rate limit exceeded. retry after 30 seconds.', 30),
+        ('Rate limit exceeded. RETRY AFTER 7', 7),
+        ('Rate limit exceeded. Retry after 0s', 0),
+        ('Rate limit exceeded. Retry after 1.5s', 1),
+        ('Rate limit exceeded.', None),
+        ('Rate limit exceeded. Try again later.', None),
+    ],
+)
+@respx.mock
+async def test_jsonrpc_rate_limit_retry_after_parser_variants(message, expected):
+    respx.post(MCP_URL).mock(
+        return_value=httpx.Response(
+            200,
+            json={
+                "jsonrpc": "2.0",
+                "id": 1,
+                "error": {"code": -32000, "message": message},
+            },
+        )
+    )
+    transport = AsyncMcpTransport(api_key="sk_test", base_url=MCP_URL)
+    transport._initialized = True
+    with pytest.raises(HyperpingRateLimitError) as exc_info:
+        await transport.call_tool("some_tool")
+    assert exc_info.value.retry_after == expected
+    await transport.close()
+
+
+@respx.mock
+async def test_jsonrpc_rate_limit_marker_requires_exceeded():
+    """A -32000 mentioning 'rate limit' but not 'exceeded' stays generic."""
+    respx.post(MCP_URL).mock(
+        return_value=httpx.Response(
+            200,
+            json={
+                "jsonrpc": "2.0",
+                "id": 1,
+                "error": {
+                    "code": -32000,
+                    "message": "Bad rate limit configuration in monitor",
+                },
+            },
+        )
+    )
+    transport = AsyncMcpTransport(api_key="sk_test", base_url=MCP_URL)
+    transport._initialized = True
+    with pytest.raises(HyperpingAPIError) as exc_info:
+        await transport.call_tool("some_tool")
+    assert not isinstance(exc_info.value, HyperpingRateLimitError)
+    await transport.close()
+
+
+# -- Notification leg rate-limit classified (MAJOR-1) ------------------------
+
+
+@respx.mock
+async def test_notifications_initialized_rate_limit_classified(monkeypatch):
+    """A 200 + -32000 on notifications/initialized raises HyperpingRateLimitError."""
+    monkeypatch.setattr(
+        "hyperping._async_mcp_transport.time.monotonic", lambda: 1000.0
+    )
+    respx.post(MCP_URL).mock(
+        side_effect=[
+            INIT_RESPONSE,
+            httpx.Response(
+                200,
+                json={
+                    "jsonrpc": "2.0",
+                    "error": {
+                        "code": -32000,
+                        "message": "Hyperping MCP rate limit exceeded. Retry after 5s.",
+                    },
+                },
+            ),
+        ]
+    )
+    transport = AsyncMcpTransport(api_key="sk_test", base_url=MCP_URL)
+    with pytest.raises(HyperpingRateLimitError) as exc_info:
+        await transport.call_tool("some_tool")
+    assert exc_info.value.retry_after == 5
+    assert exc_info.value.status_code == 200
+    assert transport._init_blocked_until > 0
+    await transport.close()
+
+
+# -- Cool-off preserves originating status_code (MAJOR-2) --------------------
+
+
+@respx.mock
+async def test_cooloff_short_circuit_preserves_429_status_code(monkeypatch):
+    """A latch armed by HTTP 429 must short-circuit with status_code=429."""
+    fake_now = {"t": 1000.0}
+    monkeypatch.setattr(
+        "hyperping._async_mcp_transport.time.monotonic", lambda: fake_now["t"]
+    )
+    respx.post(MCP_URL).mock(
+        return_value=httpx.Response(
+            429,
+            text="Too many initializes",
+            headers={"retry-after": "30"},
+        ),
+    )
+    transport = AsyncMcpTransport(api_key="sk_test", base_url=MCP_URL)
+    with pytest.raises(HyperpingRateLimitError) as e1:
+        await transport.call_tool("some_tool")
+    assert e1.value.status_code == 429
+    with pytest.raises(HyperpingRateLimitError) as e2:
+        await transport.call_tool("some_tool")
+    assert e2.value.status_code == 429
+    await transport.close()
+
+
+@respx.mock
+async def test_cooloff_short_circuit_preserves_200_status_code(monkeypatch):
+    """A latch armed by JSON-RPC -32000 short-circuits with status_code=200."""
+    fake_now = {"t": 1000.0}
+    monkeypatch.setattr(
+        "hyperping._async_mcp_transport.time.monotonic", lambda: fake_now["t"]
+    )
+    respx.post(MCP_URL).mock(
+        return_value=httpx.Response(
+            200,
+            json={
+                "jsonrpc": "2.0",
+                "id": 1,
+                "error": {
+                    "code": -32000,
+                    "message": "Hyperping MCP rate limit exceeded. Retry after 30s.",
+                },
+            },
+        )
+    )
+    transport = AsyncMcpTransport(api_key="sk_test", base_url=MCP_URL)
+    with pytest.raises(HyperpingRateLimitError) as e1:
+        await transport.call_tool("some_tool")
+    assert e1.value.status_code == 200
+    with pytest.raises(HyperpingRateLimitError) as e2:
+        await transport.call_tool("some_tool")
+    assert e2.value.status_code == 200
+    await transport.close()
+
+
+# -- math.ceil semantics on cool-off short-circuit (MAJOR-3) -----------------
+
+
+@respx.mock
+async def test_cooloff_short_circuit_uses_math_ceil_not_plus_one(monkeypatch):
+    """When server advertises Retry-After: 30, short-circuit returns 30, not 31."""
+    fake_now = {"t": 1000.0}
+    monkeypatch.setattr(
+        "hyperping._async_mcp_transport.time.monotonic", lambda: fake_now["t"]
+    )
+    respx.post(MCP_URL).mock(
+        return_value=httpx.Response(
+            200,
+            json={
+                "jsonrpc": "2.0",
+                "id": 1,
+                "error": {
+                    "code": -32000,
+                    "message": "Hyperping MCP rate limit exceeded. Retry after 30s.",
+                },
+            },
+        )
+    )
+    transport = AsyncMcpTransport(api_key="sk_test", base_url=MCP_URL)
+    with pytest.raises(HyperpingRateLimitError):
+        await transport.call_tool("some_tool")
+    with pytest.raises(HyperpingRateLimitError) as exc_info:
+        await transport.call_tool("some_tool")
+    assert exc_info.value.retry_after == 30, (
+        f"expected exactly 30 (math.ceil), got {exc_info.value.retry_after}"
+    )
+    fake_now["t"] += 0.6
+    with pytest.raises(HyperpingRateLimitError) as exc_info:
+        await transport.call_tool("some_tool")
+    assert exc_info.value.retry_after == 30
+    fake_now["t"] += 29.0
+    with pytest.raises(HyperpingRateLimitError) as exc_info:
+        await transport.call_tool("some_tool")
+    assert exc_info.value.retry_after == 1
+    await transport.close()
+
+
+# -- retry_after=0 means "no latch" (MINOR-2) --------------------------------
+
+
+@respx.mock
+async def test_jsonrpc_rate_limit_with_retry_after_zero_does_not_latch(monkeypatch):
+    """retry_after=0 from server means retry-now; do not set a 30s default."""
+    fake_now = {"t": 1000.0}
+    monkeypatch.setattr(
+        "hyperping._async_mcp_transport.time.monotonic", lambda: fake_now["t"]
+    )
+    respx.post(MCP_URL).mock(
+        side_effect=[
+            httpx.Response(
+                200,
+                json={
+                    "jsonrpc": "2.0",
+                    "id": 1,
+                    "error": {
+                        "code": -32000,
+                        "message": "Hyperping MCP rate limit exceeded. Retry after 0s.",
+                    },
+                },
+            ),
+            INIT_RESPONSE,
+            NOTIFICATION_ACCEPTED,
+            _tool_response({"ok": True}, req_id=2),
+        ]
+    )
+    transport = AsyncMcpTransport(api_key="sk_test", base_url=MCP_URL)
+    with pytest.raises(HyperpingRateLimitError):
+        await transport.call_tool("some_tool")
+    assert transport._init_blocked_until <= fake_now["t"]
+    result = await transport.call_tool("some_tool")
+    assert result == {"ok": True}
+    await transport.close()
diff --git a/tests/unit/test_mcp_client.py b/tests/unit/test_mcp_client.py
index 4a51a94..3de9c9d 100644
--- a/tests/unit/test_mcp_client.py
+++ b/tests/unit/test_mcp_client.py
@@ -323,26 +323,18 @@ def test_readme_contains_mcp_rate_limits_section():
 
 def test_changelog_contains_unreleased_entry():
     """CHANGELOG must contain an [Unreleased] block with Added and Fixed."""
+    import re
+
     changelog = (_repo_root() / "CHANGELOG.md").read_text(encoding="utf-8")
-    assert "\n## [Unreleased]\n" in changelog or changelog.startswith(
-        "## [Unreleased]\n"
-    ) or "## [Unreleased]" in changelog.splitlines(), (
+    assert re.search(r"^## \[Unreleased\]\s*$", changelog, re.MULTILINE), (
         "CHANGELOG is missing an '## [Unreleased]' top-level heading"
     )
-    # Locate the Unreleased section and confirm subheadings appear within it
-    # (before the next ## release heading).
-    lines = changelog.splitlines()
-    start = None
-    for i, line in enumerate(lines):
-        if line.strip() == "## [Unreleased]":
-            start = i
-            break
-    assert start is not None, "CHANGELOG does not contain '## [Unreleased]'"
-    end = len(lines)
-    for j in range(start + 1, len(lines)):
-        if lines[j].startswith("## [") and lines[j].strip() != "## [Unreleased]":
-            end = j
-            break
-    section = "\n".join(lines[start:end])
+    # Slice the Unreleased section: from its heading up to the next ## release.
+    match = re.search(
+        r"^## \[Unreleased\]\s*\n(.*?)(?=^## \[)",
+        changelog,
+        re.MULTILINE | re.DOTALL,
+    )
+    section = match.group(1) if match else changelog.split("## [Unreleased]", 1)[-1]
     assert "### Added" in section, "Unreleased section is missing '### Added'"
     assert "### Fixed" in section, "Unreleased section is missing '### Fixed'"
diff --git a/tests/unit/test_mcp_transport.py b/tests/unit/test_mcp_transport.py
index b8881f6..b38c45a 100644
--- a/tests/unit/test_mcp_transport.py
+++ b/tests/unit/test_mcp_transport.py
@@ -489,6 +489,14 @@ def hit() -> None:
     assert errors == [], f"unexpected errors from threads: {errors!r}"
     # Expected: exactly 4 POSTs total -- one initialize, one notification, two tool calls.
     assert init_route.call_count == 4
+    # Inspect bodies to ensure exactly ONE was the initialize handshake. A
+    # regression that fired two "initialize" POSTs and skipped the notification
+    # would still total 4, so the count alone is not sufficient evidence that
+    # the TOCTOU race is closed.
+    methods = [json.loads(call.request.content)["method"] for call in init_route.calls]
+    assert methods.count("initialize") == 1, methods
+    assert methods.count("notifications/initialized") == 1, methods
+    assert methods.count("tools/call") == 2, methods
     assert results == [{"ok": True}, {"ok": True}]
     transport.close()
 
@@ -507,8 +515,15 @@ def test_initialize_is_idempotent():
     )
     transport = McpTransport(api_key="sk_test", base_url=MCP_URL)
     transport.initialize()
-    transport.initialize()  # second call must be a no-op
-    assert route.call_count == 2  # initialize + notification, not 4
+    count_after_first = route.call_count
+    assert count_after_first == 2  # initialize + notification
+    transport.initialize()  # second call must be a pure no-op
+    assert route.call_count == count_after_first, (
+        "second initialize() must not issue any HTTP request"
+    )
+    # And the request methods are exactly what we expect, in order.
+    methods = [json.loads(call.request.content)["method"] for call in route.calls]
+    assert methods == ["initialize", "notifications/initialized"]
     transport.close()
 
 
@@ -581,7 +596,7 @@ def test_initialize_cooloff_clears_after_deadline(monkeypatch):
         },
     )
 
-    respx.post(MCP_URL).mock(
+    route = respx.post(MCP_URL).mock(
         side_effect=[
             httpx.Response(200, json=rl_body),  # first initialize: rate-limited
             ok_init,  # second initialize: success
@@ -593,15 +608,20 @@ def test_initialize_cooloff_clears_after_deadline(monkeypatch):
     transport = McpTransport(api_key="sk_test", base_url=MCP_URL)
     with pytest.raises(HyperpingRateLimitError):
         transport.call_tool("some_tool")
+    assert route.call_count == 1, "first call_tool must POST initialize exactly once"
 
-    # Still latched.
+    # Still latched: zero further HTTP traffic until the cool-off elapses.
     with pytest.raises(HyperpingRateLimitError):
         transport.call_tool("some_tool")
+    assert route.call_count == 1, "latched call_tool must not POST"
 
-    # Advance past the deadline.
+    # Advance past the deadline. Now initialize + notification + tool call fire.
     fake_now["t"] += 100.0
     result = transport.call_tool("some_tool")
     assert result == {"ok": True}
+    assert route.call_count == 4, (
+        "post-deadline call_tool must POST initialize + notification + tool"
+    )
     transport.close()
 
 
@@ -681,3 +701,259 @@ def test_six_fresh_clients_under_jsonrpc_rate_limit_all_fail_typed():
     for exc in captured:
         assert exc.retry_after == 32
         assert exc.status_code == 200
+
+
+# -- Retry-After parser robustness (MAJOR-7) ---------------------------------
+
+
+@pytest.mark.parametrize(
+    "message, expected",
+    [
+        ('Hyperping MCP rate limit exceeded. Retry after 32s.', 32),
+        ('Rate limit exceeded for "initialize". Retry-After: 30s', 30),
+        ('Rate limit exceeded. retry after 30 seconds.', 30),
+        ('Rate limit exceeded. RETRY AFTER 7', 7),
+        ('Rate limit exceeded. Retry after 0s', 0),
+        ('Rate limit exceeded. Retry after 1.5s', 1),  # sub-second floored
+        ('Rate limit exceeded.', None),  # no advertised value
+        ('Rate limit exceeded. Try again later.', None),  # graceful
+    ],
+)
+@respx.mock
+def test_jsonrpc_rate_limit_retry_after_parser_variants(message, expected):
+    respx.post(MCP_URL).mock(
+        return_value=httpx.Response(
+            200,
+            json={
+                "jsonrpc": "2.0",
+                "id": 1,
+                "error": {"code": -32000, "message": message},
+            },
+        )
+    )
+    transport = McpTransport(api_key="sk_test", base_url=MCP_URL)
+    transport._initialized = True
+    with pytest.raises(HyperpingRateLimitError) as exc_info:
+        transport.call_tool("some_tool")
+    assert exc_info.value.retry_after == expected
+    transport.close()
+
+
+# -- Marker tightness: "rate limit" without "exceeded" is NOT a rate-limit ---
+
+
+@respx.mock
+def test_jsonrpc_rate_limit_marker_requires_exceeded():
+    """A -32000 mentioning 'rate limit' but not 'exceeded' stays generic."""
+    respx.post(MCP_URL).mock(
+        return_value=httpx.Response(
+            200,
+            json={
+                "jsonrpc": "2.0",
+                "id": 1,
+                "error": {
+                    "code": -32000,
+                    "message": "Bad rate limit configuration in monitor",
+                },
+            },
+        )
+    )
+    transport = McpTransport(api_key="sk_test", base_url=MCP_URL)
+    transport._initialized = True
+    with pytest.raises(HyperpingAPIError) as exc_info:
+        transport.call_tool("some_tool")
+    assert not isinstance(exc_info.value, HyperpingRateLimitError)
+    transport.close()
+
+
+# -- Notification leg rate-limit classified (MAJOR-1) ------------------------
+
+
+@respx.mock
+def test_notifications_initialized_rate_limit_classified(monkeypatch):
+    """A 200 + -32000 on notifications/initialized raises HyperpingRateLimitError."""
+    monkeypatch.setattr(
+        "hyperping._mcp_transport.time.monotonic", lambda: 1000.0
+    )
+    respx.post(MCP_URL).mock(
+        side_effect=[
+            # initialize succeeds
+            httpx.Response(
+                200,
+                json={"jsonrpc": "2.0", "id": 1, "result": {"protocolVersion": "2025-03-26"}},
+            ),
+            # notification gets rate-limited via JSON-RPC error
+            httpx.Response(
+                200,
+                json={
+                    "jsonrpc": "2.0",
+                    "error": {
+                        "code": -32000,
+                        "message": "Hyperping MCP rate limit exceeded. Retry after 5s.",
+                    },
+                },
+            ),
+        ]
+    )
+    transport = McpTransport(api_key="sk_test", base_url=MCP_URL)
+    with pytest.raises(HyperpingRateLimitError) as exc_info:
+        transport.call_tool("some_tool")
+    assert exc_info.value.retry_after == 5
+    assert exc_info.value.status_code == 200
+    # The latch must be armed by the second-leg failure, so the next
+    # call_tool short-circuits without HTTP traffic.
+    assert transport._init_blocked_until > 0
+    transport.close()
+
+
+# -- Cool-off preserves originating status_code (MAJOR-2) --------------------
+
+
+@respx.mock
+def test_cooloff_short_circuit_preserves_429_status_code(monkeypatch):
+    """A latch armed by HTTP 429 must short-circuit with status_code=429."""
+    fake_now = {"t": 1000.0}
+    monkeypatch.setattr(
+        "hyperping._mcp_transport.time.monotonic", lambda: fake_now["t"]
+    )
+    respx.post(MCP_URL).mock(
+        return_value=httpx.Response(
+            429,
+            text="Too many initializes",
+            headers={"retry-after": "30"},
+        ),
+    )
+    transport = McpTransport(api_key="sk_test", base_url=MCP_URL)
+    with pytest.raises(HyperpingRateLimitError) as e1:
+        transport.call_tool("some_tool")
+    assert e1.value.status_code == 429
+    # Second call within the window: short-circuit must preserve status_code=429.
+    with pytest.raises(HyperpingRateLimitError) as e2:
+        transport.call_tool("some_tool")
+    assert e2.value.status_code == 429
+    transport.close()
+
+
+@respx.mock
+def test_cooloff_short_circuit_preserves_200_status_code(monkeypatch):
+    """A latch armed by JSON-RPC -32000 short-circuits with status_code=200."""
+    fake_now = {"t": 1000.0}
+    monkeypatch.setattr(
+        "hyperping._mcp_transport.time.monotonic", lambda: fake_now["t"]
+    )
+    respx.post(MCP_URL).mock(
+        return_value=httpx.Response(
+            200,
+            json={
+                "jsonrpc": "2.0",
+                "id": 1,
+                "error": {
+                    "code": -32000,
+                    "message": "Hyperping MCP rate limit exceeded. Retry after 30s.",
+                },
+            },
+        )
+    )
+    transport = McpTransport(api_key="sk_test", base_url=MCP_URL)
+    with pytest.raises(HyperpingRateLimitError) as e1:
+        transport.call_tool("some_tool")
+    assert e1.value.status_code == 200
+    with pytest.raises(HyperpingRateLimitError) as e2:
+        transport.call_tool("some_tool")
+    assert e2.value.status_code == 200
+    transport.close()
+
+
+# -- math.ceil semantics on cool-off short-circuit (MAJOR-3) -----------------
+
+
+@respx.mock
+def test_cooloff_short_circuit_uses_math_ceil_not_plus_one(monkeypatch):
+    """When server advertises Retry-After: 30, short-circuit returns 30, not 31."""
+    fake_now = {"t": 1000.0}
+    monkeypatch.setattr(
+        "hyperping._mcp_transport.time.monotonic", lambda: fake_now["t"]
+    )
+    respx.post(MCP_URL).mock(
+        return_value=httpx.Response(
+            200,
+            json={
+                "jsonrpc": "2.0",
+                "id": 1,
+                "error": {
+                    "code": -32000,
+                    "message": "Hyperping MCP rate limit exceeded. Retry after 30s.",
+                },
+            },
+        )
+    )
+    transport = McpTransport(api_key="sk_test", base_url=MCP_URL)
+    with pytest.raises(HyperpingRateLimitError):
+        transport.call_tool("some_tool")  # arms latch with wait=30
+    # Same monotonic moment: remaining is exactly 30.0 -> ceil(30.0) == 30.
+    with pytest.raises(HyperpingRateLimitError) as exc_info:
+        transport.call_tool("some_tool")
+    assert exc_info.value.retry_after == 30, (
+        f"expected exactly 30 (math.ceil), got {exc_info.value.retry_after}"
+    )
+    # Advance a fractional amount: remaining = 29.4 -> ceil = 30.
+    fake_now["t"] += 0.6
+    with pytest.raises(HyperpingRateLimitError) as exc_info:
+        transport.call_tool("some_tool")
+    assert exc_info.value.retry_after == 30
+    # Advance to within 1s of the deadline: remaining = 0.4 -> ceil = 1.
+    fake_now["t"] += 29.0
+    with pytest.raises(HyperpingRateLimitError) as exc_info:
+        transport.call_tool("some_tool")
+    assert exc_info.value.retry_after == 1
+    transport.close()
+
+
+# -- retry_after=0 means "no latch" (MINOR-2) --------------------------------
+
+
+@respx.mock
+def test_jsonrpc_rate_limit_with_retry_after_zero_does_not_latch(monkeypatch):
+    """retry_after=0 from server means retry-now; do not set a 30s default."""
+    fake_now = {"t": 1000.0}
+    monkeypatch.setattr(
+        "hyperping._mcp_transport.time.monotonic", lambda: fake_now["t"]
+    )
+    ok_init = httpx.Response(
+        200,
+        json={"jsonrpc": "2.0", "id": 1, "result": {"protocolVersion": "2025-03-26"}},
+    )
+    ok_tool = httpx.Response(
+        200,
+        json={
+            "jsonrpc": "2.0",
+            "id": 2,
+            "result": {"content": [{"type": "text", "text": json.dumps({"ok": True})}]},
+        },
+    )
+    respx.post(MCP_URL).mock(
+        side_effect=[
+            httpx.Response(
+                200,
+                json={
+                    "jsonrpc": "2.0",
+                    "id": 1,
+                    "error": {
+                        "code": -32000,
+                        "message": "Hyperping MCP rate limit exceeded. Retry after 0s.",
+                    },
+                },
+            ),
+            ok_init,
+            httpx.Response(202),
+            ok_tool,
+        ]
+    )
+    transport = McpTransport(api_key="sk_test", base_url=MCP_URL)
+    with pytest.raises(HyperpingRateLimitError):
+        transport.call_tool("some_tool")
+    # Latch deadline equals "now" so the next call is permitted immediately.
+    assert transport._init_blocked_until <= fake_now["t"]
+    result = transport.call_tool("some_tool")
+    assert result == {"ok": True}
+    transport.close()