develeap · KhaledSalhab-Develeap · May 21, 2026 · May 20, 2026 · May 20, 2026 · May 20, 2026
diff --git a/.gitignore b/.gitignore
@@ -36,3 +36,4 @@ htmlcov/
 # Local dev tooling
 .claude/
 dist/
+.worktrees/
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -5,6 +5,43 @@ All notable changes to this project will be documented in this file.
 The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
 and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
 
+## [Unreleased]
+
+### Added
+
+- `ensure_initialized()` on `HyperpingMcpClient` and `AsyncHyperpingMcpClient` for
+  startup health checks. Performs the MCP handshake now if it hasn't happened yet
+  and raises `HyperpingRateLimitError` if the server's `initialize` cap is hit.
+- New "MCP rate limits and connection lifecycle" section in README documenting
+  Hyperping's stateless MCP server, the undocumented `initialize` cap, and the
+  recommended client lifetime per process.
+
+### Fixed
+
+- MCP rate-limit errors that the server returns as HTTP 200 with JSON-RPC
+  `error.code = -32000` (notably the `initialize` per-minute cap) are now
+  classified as `HyperpingRateLimitError` with `retry_after` parsed from the
+  message, instead of a generic `HyperpingAPIError`. Existing HTTP 429 handling is
+  unchanged.
+- After a rate-limit on `initialize`, the MCP transport latches a cool-off so
+  subsequent `call_tool` invocations short-circuit with `HyperpingRateLimitError`
+  until the advertised `retry_after` elapses, instead of issuing further HTTP
+  requests that would burn more slots from the bucket.
+- TOCTOU race in lazy `initialize` where two concurrent first calls on the same
+  `HyperpingMcpClient` could each POST `initialize`. The handshake is now
+  performed under a dedicated lock with a double-checked flag, including a
+  lockless fast path so post-handshake `call_tool` does not contend on it.
+- Cool-off short-circuit now preserves the originating status code (200 for
+  JSON-RPC `-32000`, 429 for HTTP 429) so callers can distinguish buckets, and
+  `retry_after` uses `math.ceil` to avoid over-reporting by one second.
+- JSON-RPC rate-limit signals returned on the `notifications/initialized` leg
+  are now classified as `HyperpingRateLimitError` (previously they were
+  silently treated as a successful notification).
+- Rate-limit detection requires the message to contain `"rate limit exceeded"`
+  (the observed phrasing) to avoid false positives on unrelated server messages
+  that happen to mention `"rate limit"`. The `Retry-After` parser now also
+  accepts `Retry-After:` and `retry after N seconds` variants.
+
 ## [1.6.0] - 2026-05-06
 
 ### Added

diff --git a/README.md b/README.md
@@ -193,6 +193,59 @@ The MCP client uses the same API key as `HyperpingClient`. All methods return
 plain dicts/lists; use the exported Pydantic models (e.g., `OnCallSchedule`,
 `EscalationPolicy`) for validation if needed.
 
+### MCP rate limits and connection lifecycle
+
+The Hyperping MCP server (`https://api.hyperping.io/v1/mcp`) is
+[documented by Hyperping as stateless over HTTP](https://hyperping.com/mcp)
+and rate-limits per API key. The publicly documented limit is 300 requests per
+minute shared with the REST API
+([rate-limit docs](https://hyperping.com/docs/monitoring/api-rate-limits)), but
+the server also enforces a separate, undocumented cap on the `initialize`
+handshake (observed around 5/minute). Because every new `HyperpingMcpClient`
+instance must perform the MCP `initialize` handshake on its first call,
+instantiating the client in a hot path or running several short-lived processes
+against one key will trip this cap.
+
+Operational guidance:
+
+- **Create one `HyperpingMcpClient` per process and reuse it.** Do not instantiate
+  it inside a loop. The first call performs the handshake; subsequent calls reuse
+  it for the life of the client.
+- **Catch `HyperpingRateLimitError` and honour `retry_after`.** Rate-limit signals
+  arrive two ways: as HTTP 429 (with a standard `Retry-After` header) and as a
+  JSON-RPC server error (`code: -32000`, HTTP 200) on `initialize`. Both surface as
+  `HyperpingRateLimitError` with `retry_after` parsed from whichever signal was
+  used. The `status_code` attribute is `429` or `200`, matching the underlying
+  signal; cool-off short-circuits preserve the originating status code so callers
+  can disambiguate the two buckets.
+- **Use `ensure_initialized()` for startup health checks.** Calling it once on
+  service boot lets you fail fast if the key is already at the `initialize` cap,
+  instead of failing on the first business call.
+- **Several workloads on one key collide on the `initialize` cap.** A weekly cron,
+  a watchdog daemon, and a developer running the CLI cannot all warm up the same
+  API key inside one minute. Use one long-lived process per workload, or separate
+  API keys per workload if your plan allows.
+- **After a rate-limit on `initialize`, the SDK latches a cool-off** so that
+  subsequent `call_tool` invocations on the same client fail fast with
+  `HyperpingRateLimitError` (no extra HTTP traffic) until `retry_after` elapses.
+  This prevents accidentally burning more slots from the bucket. The latch is
+  per-`HyperpingMcpClient` instance and per-process; it does not coordinate
+  across separate Python processes sharing the same API key, so multi-process
+  setups still need the workload-separation advice above.
+
+```python
+from hyperping import HyperpingMcpClient, HyperpingRateLimitError
+
+mcp = HyperpingMcpClient(api_key="sk_...")
+try:
+    mcp.ensure_initialized()
+except HyperpingRateLimitError as e:
+    print(f"MCP cold-start rate-limited; retry in {e.retry_after}s")
+    raise
+
+summary = mcp.get_status_summary()
+```
+
 ### Healthchecks
 
 ```python