status: distinguish API down from unreachable#170
Conversation
When /status returns a non-2xx response the API is reachable but unhealthy, so report it as down rather than reusing the generic 'could not reach' message used for transport errors.
5c1365d to
cfc2591
Compare
…broken Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
cfc2591 to
ad44904
Compare
Monitoring Plan: Improve
|
masnwilliams
left a comment
There was a problem hiding this comment.
approved — nicely distinguishes API-down from status-unavailable. verified /health exists (registered in the API and used as Railway's healthcheck path) and shares the same base url as /status, so the probe is apples-to-apples.
nit: cmd/status.go:53 — the secondary /health probe reuses the same 10s client timeout, so a fully-hung API could take ~20s before the CLI prints. consider a shorter timeout on the fallback probe.
Use a dedicated http.Client for the /health probe with a 3s timeout instead of sharing the 10s client used for /status. Caps worst-case wait during a genuine outage (both /status and /health hung) at ~13s instead of ~20s. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
addressed the timeout nit in 3d48cd0 - dedicated healthClient with 3s timeout for the probe, caps worst-case wait at ~13s. lmk if you want to take another look before i merge. |
Summary
Splits the failure handling for
kernel statusso the message accurately reflects what happened, instead of collapsing both transport errors and non-2xx responses into one ambiguous "Could not reach Kernel API" line.Three failure cases now:
/status(network down, DNS, timeout) →Could not reach Kernel API. Check https://status.kernel.sh for updates./status,/healthalso unhealthy →Kernel API is down. Check https://status.kernel.sh for updates./status,/healthreturns 200 →Kernel API is responding but /status is unavailable. Check https://status.kernel.sh for updates.The
/healthprobe matters because/statushas an upstream dependency on incident.io, so a 5xx from/statusdoesn't necessarily mean the API itself is down./healthis a dependency-free liveness check on the same server, so a 200 there confirms the API is up and only the status endpoint is broken.No new helpers, no synthetic UI rendering — reserves the dotted status UI for real data from the API (same convention the dashboard's status indicator follows).
Test plan
/status5xx +/health5xx prints "Kernel API is down…"/status5xx +/health200 prints "Kernel API is responding but /status is unavailable…"🤖 Generated with Claude Code
Note
Low Risk
User-facing error strings and an extra HTTP GET on failure paths only; no change to successful status rendering or auth/data handling.
Overview
kernel statusnow reports three distinct failure modes instead of one generic “could not reach” message for every non-success case.When
/statusreturns a non-2xx response, the CLI probes/health(3s timeout). If/healthis healthy, users see that the API is up but/statusis unavailable (e.g. incident.io dependency). If/healthis also unhealthy, the message says the API is down. Transport errors on/statusstill use the original “could not reach” wording.Reviewed by Cursor Bugbot for commit 3d48cd0. Bugbot is set up for automated code reviews on this repo. Configure here.