Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -36,3 +36,4 @@ htmlcov/
# Local dev tooling
.claude/
dist/
.worktrees/
37 changes: 37 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,43 @@ All notable changes to this project will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [Unreleased]

### Added

- `ensure_initialized()` on `HyperpingMcpClient` and `AsyncHyperpingMcpClient` for
startup health checks. Performs the MCP handshake now if it hasn't happened yet
and raises `HyperpingRateLimitError` if the server's `initialize` cap is hit.
- New "MCP rate limits and connection lifecycle" section in README documenting
Hyperping's stateless MCP server, the undocumented `initialize` cap, and the
recommended client lifetime per process.

### Fixed

- MCP rate-limit errors that the server returns as HTTP 200 with JSON-RPC
`error.code = -32000` (notably the `initialize` per-minute cap) are now
classified as `HyperpingRateLimitError` with `retry_after` parsed from the
message, instead of a generic `HyperpingAPIError`. Existing HTTP 429 handling is
unchanged.
- After a rate-limit on `initialize`, the MCP transport latches a cool-off so
subsequent `call_tool` invocations short-circuit with `HyperpingRateLimitError`
until the advertised `retry_after` elapses, instead of issuing further HTTP
requests that would burn more slots from the bucket.
- TOCTOU race in lazy `initialize` where two concurrent first calls on the same
`HyperpingMcpClient` could each POST `initialize`. The handshake is now
performed under a dedicated lock with a double-checked flag, including a
lockless fast path so post-handshake `call_tool` does not contend on it.
- Cool-off short-circuit now preserves the originating status code (200 for
JSON-RPC `-32000`, 429 for HTTP 429) so callers can distinguish buckets, and
`retry_after` uses `math.ceil` to avoid over-reporting by one second.
- JSON-RPC rate-limit signals returned on the `notifications/initialized` leg
are now classified as `HyperpingRateLimitError` (previously they were
silently treated as a successful notification).
- Rate-limit detection requires the message to contain `"rate limit exceeded"`
(the observed phrasing) to avoid false positives on unrelated server messages
that happen to mention `"rate limit"`. The `Retry-After` parser now also
accepts `Retry-After:` and `retry after N seconds` variants.

## [1.6.0] - 2026-05-06

### Added
Expand Down
53 changes: 53 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -193,6 +193,59 @@ The MCP client uses the same API key as `HyperpingClient`. All methods return
plain dicts/lists; use the exported Pydantic models (e.g., `OnCallSchedule`,
`EscalationPolicy`) for validation if needed.

### MCP rate limits and connection lifecycle

The Hyperping MCP server (`https://api.hyperping.io/v1/mcp`) is
[documented by Hyperping as stateless over HTTP](https://hyperping.com/mcp)
and rate-limits per API key. The publicly documented limit is 300 requests per
minute shared with the REST API
([rate-limit docs](https://hyperping.com/docs/monitoring/api-rate-limits)), but
the server also enforces a separate, undocumented cap on the `initialize`
handshake (observed around 5/minute). Because every new `HyperpingMcpClient`
instance must perform the MCP `initialize` handshake on its first call,
instantiating the client in a hot path or running several short-lived processes
against one key will trip this cap.

Operational guidance:

- **Create one `HyperpingMcpClient` per process and reuse it.** Do not instantiate
it inside a loop. The first call performs the handshake; subsequent calls reuse
it for the life of the client.
- **Catch `HyperpingRateLimitError` and honour `retry_after`.** Rate-limit signals
arrive two ways: as HTTP 429 (with a standard `Retry-After` header) and as a
JSON-RPC server error (`code: -32000`, HTTP 200) on `initialize`. Both surface as
`HyperpingRateLimitError` with `retry_after` parsed from whichever signal was
used. The `status_code` attribute is `429` or `200`, matching the underlying
signal; cool-off short-circuits preserve the originating status code so callers
can disambiguate the two buckets.
- **Use `ensure_initialized()` for startup health checks.** Calling it once on
service boot lets you fail fast if the key is already at the `initialize` cap,
instead of failing on the first business call.
- **Several workloads on one key collide on the `initialize` cap.** A weekly cron,
a watchdog daemon, and a developer running the CLI cannot all warm up the same
API key inside one minute. Use one long-lived process per workload, or separate
API keys per workload if your plan allows.
- **After a rate-limit on `initialize`, the SDK latches a cool-off** so that
subsequent `call_tool` invocations on the same client fail fast with
`HyperpingRateLimitError` (no extra HTTP traffic) until `retry_after` elapses.
This prevents accidentally burning more slots from the bucket. The latch is
per-`HyperpingMcpClient` instance and per-process; it does not coordinate
across separate Python processes sharing the same API key, so multi-process
setups still need the workload-separation advice above.

```python
from hyperping import HyperpingMcpClient, HyperpingRateLimitError

mcp = HyperpingMcpClient(api_key="sk_...")
try:
mcp.ensure_initialized()
except HyperpingRateLimitError as e:
print(f"MCP cold-start rate-limited; retry in {e.retry_after}s")
raise

summary = mcp.get_status_summary()
```

### Healthchecks

```python
Expand Down
Loading
Loading