Skip to content

feat(lake): enforce retention TTL per dataset + soft per-org Lake quota (CL-9)#501

Merged
TerrifiedBug merged 1 commit into
mainfrom
feat/cl-9-lake-quota-enforcement
Jun 8, 2026
Merged

feat(lake): enforce retention TTL per dataset + soft per-org Lake quota (CL-9)#501
TerrifiedBug merged 1 commit into
mainfrom
feat/cl-9-lake-quota-enforcement

Conversation

@TerrifiedBug

Copy link
Copy Markdown
Owner

CL-9 — Lake retention enforcement + per-org Lake byte quota (GA-hardening slice)

LakeRetentionPolicy(hotDays, coldDays) was attached to datasets but only the table-default TTL was ever applied, and there was no per-org Lake byte quota gate. This ships a coherent, testable enforcement slice for both.

Retention enforcement — src/server/services/lake/lake-retention.ts

  • effectiveRetention(policy) — pure; computes a dataset's effective hot/cold window, falls back to the table defaults (7/90) when no policy, and clamps coldDays >= hotDays.
  • buildLakeTtlClause(retention, coldTierEnabled) — pure; owns the ClickHouse TTL clause shape. migrate.ts now builds the base table TTL through this same function (buildLakeTtlClause(effectiveRetention(null), …)), so the table default and any per-dataset window share one code path (behaviour unchanged: still INTERVAL 7 DAY TO VOLUME 'cold' / INTERVAL 90 DAY DELETE / storage_policy = 'vf_hot_cold').
  • enforceDatasetRetention(...) — applies a dataset's coldDays as a drop horizon: a bounded, org+pipeline-scoped DELETE FROM lake_events WHERE … AND timestamp < {cutoff} (bound params), behind the shared lake client. Catches datasets whose policy is shorter than the global table TTL.
  • sweepLakeRetention() + a leader-gated scheduler (initLakeRetentionScheduler, mirrors lake-alerts) walk the catalog daily and enforce each dataset's horizon; wired into instrumentation.node.ts. Per-dataset errors are logged and skipped (one bad dataset never stalls the sweep).

Retention deletion is the intended data lifecycle, so this path does delete expired rows — distinct from the quota path, which never drops.

Per-org Lake byte quota — src/server/services/lake/lake-quota.ts

  • LakeQuotaProvider seam mirroring src/server/services/quotas.ts: OSS ships DefaultUnlimitedLakeQuotaProvider (unmetered self-hosted); a commercial deployment registers its own via setLakeQuotaProvider(...) to map an org's tier → byte ceiling.
  • checkLakeQuota(orgId, currentBytes, quotaBytes) — pure (under/over/at/unlimited/zero-ceiling).
  • evaluateLakeQuota(orgId) — sums the catalog byteCount and soft-signals when over quota: a structured warn log + the read-only lake.quotaStatus tRPC query (UI badge surface). It never drops, rejects, or rewrites data. Wired into the catalog/heartbeat ingest path (updateLakeCatalogFromHeartbeat), best-effort. Free in OSS — the default provider short-circuits before any DB read.

No migration

Over-quota is derived on read (sum of LakeDataset.byteCount vs the provider ceiling) and retention reuses the existing LakeRetentionPolicy/LakeDataset fields — so no schema change was needed (preferred per the task).

Tests (npx vitest run src/server/services/lake/ → 116 pass)

  • lake-quota.test.ts — pure check under/over/at-ceiling/unlimited/zero; evaluateLakeQuota unmetered short-circuit, under-quota (no signal), over-quota fires the signal with zero data mutation (explicit no-data-drop invariant), empty-catalog null sum.
  • lake-retention.test.tseffectiveRetention defaults/policy/malformed/clamp; buildLakeTtlClause cold on/off + per-dataset window; enforceDatasetRetention disabled no-op + per-policy coldDays cutoff + bound-param scope; sweepLakeRetention disabled no-op, per-dataset horizons, error tolerance.
  • Prisma and @clickhouse/client are mocked.

Verification notes

  • npx tsc --noEmit: clean except the pre-existing @clickhouse/client module-resolution error in clickhouse.ts (the native dep isn't installed in this dev env; resolves on CI).
  • lake.quotaStatus takes no tenant-id input (org from ctx.organizationId), so the cross-org-access linter skips it — verified by loading appRouter and inspecting the live procedure.
  • ClickHouse/scheduler/instrumentation paths aren't locally runnable; unit tests + types are the bar.

LakeRetentionPolicy was attached to datasets but never enforced beyond the
table-default TTL, and there was no per-org Lake byte quota gate. Ship a
coherent GA-hardening enforcement slice (CL-9).

Retention enforcement (lake-retention.ts):
- effectiveRetention(policy) computes a dataset's effective hot/cold window,
  falling back to the table defaults (7/90) and clamping coldDays >= hotDays.
- buildLakeTtlClause() owns the ClickHouse TTL-clause shape; the migration
  runner now builds its base table TTL through it, so the table default and any
  per-dataset window share one code path.
- enforceDatasetRetention() applies a dataset's coldDays as a bounded,
  org+pipeline-scoped DELETE drop horizon (behind the shared lake client),
  catching datasets whose policy is shorter than the global table TTL.
- sweepLakeRetention() + a leader-gated scheduler (mirroring lake-alerts) run
  it across the catalog; wired into instrumentation.

Per-org Lake byte quota (lake-quota.ts):
- Injectable LakeQuotaProvider seam mirroring quotas.ts; OSS default is
  unlimited (unmetered self-hosted), cloud overrides via its tier provider.
- checkLakeQuota() is pure; evaluateLakeQuota() sums the catalog and SOFT-signals
  when over quota (warn log + read-only lake.quotaStatus surface) — it never
  drops, rejects, or rewrites data. Wired into the catalog/heartbeat ingest path.

No migration: over-quota is derived on read and retention reuses existing
LakeRetentionPolicy/LakeDataset fields.

Tests: quota under/over/at/unlimited/zero-ceiling + no-data-drop invariant;
retention TTL computation per policy + per-dataset drop-horizon DELETE + sweep
error tolerance. Prisma and @clickhouse/client mocked.
@TerrifiedBug TerrifiedBug merged commit 4ec45dd into main Jun 8, 2026
17 checks passed
@TerrifiedBug TerrifiedBug deleted the feat/cl-9-lake-quota-enforcement branch June 8, 2026 14:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant