feat(lake): enforce retention TTL per dataset + soft per-org Lake quota (CL-9)#501
Merged
Merged
Conversation
LakeRetentionPolicy was attached to datasets but never enforced beyond the table-default TTL, and there was no per-org Lake byte quota gate. Ship a coherent GA-hardening enforcement slice (CL-9). Retention enforcement (lake-retention.ts): - effectiveRetention(policy) computes a dataset's effective hot/cold window, falling back to the table defaults (7/90) and clamping coldDays >= hotDays. - buildLakeTtlClause() owns the ClickHouse TTL-clause shape; the migration runner now builds its base table TTL through it, so the table default and any per-dataset window share one code path. - enforceDatasetRetention() applies a dataset's coldDays as a bounded, org+pipeline-scoped DELETE drop horizon (behind the shared lake client), catching datasets whose policy is shorter than the global table TTL. - sweepLakeRetention() + a leader-gated scheduler (mirroring lake-alerts) run it across the catalog; wired into instrumentation. Per-org Lake byte quota (lake-quota.ts): - Injectable LakeQuotaProvider seam mirroring quotas.ts; OSS default is unlimited (unmetered self-hosted), cloud overrides via its tier provider. - checkLakeQuota() is pure; evaluateLakeQuota() sums the catalog and SOFT-signals when over quota (warn log + read-only lake.quotaStatus surface) — it never drops, rejects, or rewrites data. Wired into the catalog/heartbeat ingest path. No migration: over-quota is derived on read and retention reuses existing LakeRetentionPolicy/LakeDataset fields. Tests: quota under/over/at/unlimited/zero-ceiling + no-data-drop invariant; retention TTL computation per policy + per-dataset drop-horizon DELETE + sweep error tolerance. Prisma and @clickhouse/client mocked.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
CL-9 — Lake retention enforcement + per-org Lake byte quota (GA-hardening slice)
LakeRetentionPolicy(hotDays, coldDays)was attached to datasets but only the table-default TTL was ever applied, and there was no per-org Lake byte quota gate. This ships a coherent, testable enforcement slice for both.Retention enforcement —
src/server/services/lake/lake-retention.tseffectiveRetention(policy)— pure; computes a dataset's effective hot/cold window, falls back to the table defaults (7/90) when no policy, and clampscoldDays >= hotDays.buildLakeTtlClause(retention, coldTierEnabled)— pure; owns the ClickHouseTTLclause shape.migrate.tsnow builds the base table TTL through this same function (buildLakeTtlClause(effectiveRetention(null), …)), so the table default and any per-dataset window share one code path (behaviour unchanged: stillINTERVAL 7 DAY TO VOLUME 'cold'/INTERVAL 90 DAY DELETE/storage_policy = 'vf_hot_cold').enforceDatasetRetention(...)— applies a dataset'scoldDaysas a drop horizon: a bounded, org+pipeline-scopedDELETE FROM lake_events WHERE … AND timestamp < {cutoff}(bound params), behind the shared lake client. Catches datasets whose policy is shorter than the global table TTL.sweepLakeRetention()+ a leader-gated scheduler (initLakeRetentionScheduler, mirrorslake-alerts) walk the catalog daily and enforce each dataset's horizon; wired intoinstrumentation.node.ts. Per-dataset errors are logged and skipped (one bad dataset never stalls the sweep).Per-org Lake byte quota —
src/server/services/lake/lake-quota.tsLakeQuotaProviderseam mirroringsrc/server/services/quotas.ts: OSS shipsDefaultUnlimitedLakeQuotaProvider(unmetered self-hosted); a commercial deployment registers its own viasetLakeQuotaProvider(...)to map an org's tier → byte ceiling.checkLakeQuota(orgId, currentBytes, quotaBytes)— pure (under/over/at/unlimited/zero-ceiling).evaluateLakeQuota(orgId)— sums the catalogbyteCountand soft-signals when over quota: a structuredwarnlog + the read-onlylake.quotaStatustRPC query (UI badge surface). It never drops, rejects, or rewrites data. Wired into the catalog/heartbeat ingest path (updateLakeCatalogFromHeartbeat), best-effort. Free in OSS — the default provider short-circuits before any DB read.No migration
Over-quota is derived on read (sum of
LakeDataset.byteCountvs the provider ceiling) and retention reuses the existingLakeRetentionPolicy/LakeDatasetfields — so no schema change was needed (preferred per the task).Tests (
npx vitest run src/server/services/lake/→ 116 pass)lake-quota.test.ts— pure check under/over/at-ceiling/unlimited/zero;evaluateLakeQuotaunmetered short-circuit, under-quota (no signal), over-quota fires the signal with zero data mutation (explicit no-data-drop invariant), empty-catalog null sum.lake-retention.test.ts—effectiveRetentiondefaults/policy/malformed/clamp;buildLakeTtlClausecold on/off + per-dataset window;enforceDatasetRetentiondisabled no-op + per-policycoldDayscutoff + bound-param scope;sweepLakeRetentiondisabled no-op, per-dataset horizons, error tolerance.@clickhouse/clientare mocked.Verification notes
npx tsc --noEmit: clean except the pre-existing@clickhouse/clientmodule-resolution error inclickhouse.ts(the native dep isn't installed in this dev env; resolves on CI).lake.quotaStatustakes no tenant-id input (org fromctx.organizationId), so the cross-org-access linter skips it — verified by loadingappRouterand inspecting the live procedure.