Skip to content

feat(webapp): consolidate auth path + add comprehensive auth tests#3499

Open
matt-aitken wants to merge 93 commits intomainfrom
rbac-packages
Open

feat(webapp): consolidate auth path + add comprehensive auth tests#3499
matt-aitken wants to merge 93 commits intomainfrom
rbac-packages

Conversation

@matt-aitken
Copy link
Copy Markdown
Member

Summary

Consolidates the webapp's authentication and authorization into a small set of route helpers, replacing the ad-hoc requireUser / requireUserId / authenticatedEnvironmentForAuthentication calls scattered across routes. Same security model, but the per-request flow (authenticate → authorize → load) now lives in one place per route family.

Adds a comprehensive end-to-end auth test suite that didn't exist before — 162 tests covering API key, PAT and JWT auth across the public API surface, plus dashboard session auth for admin pages.

Changes

Dashboard auth (started, partial rollout)

Admin and settings pages migrated to a unified loader/action helper that authenticates the session, runs an authorization check, and exposes the result to the route. Other dashboard routes still on the old pattern; remaining migration tracked separately.

Migrated routes:

  • admin.* (14 admin / back-office / feature-flags / LLM-models / notifications / orgs / concurrency pages)
  • _app.orgs.$organizationSlug.settings.team
  • _app.orgs.$organizationSlug.settings.roles

API / realtime / engine auth (complete for the migrated families)

71 routes migrated to a unified apiBuilder that centralizes Bearer / PAT / Public-JWT authentication and applies the per-route authorization check before the handler runs. Includes:

  • api.v1.* and api.v2.* and api.v3.* — tasks, runs, batches, queues, prompts, deployments, query, sessions, waitpoints, packets, workers, idempotency keys
  • realtime.v1.* — runs, batches, sessions, streams
  • engine.v1.* — dev / worker-action protocols

Side effect: action aliases preserved historic JWT scope semantics where the new model is stricter (e.g. a write:tasks JWT now also satisfies trigger / batchTrigger / update actions on the same resource — matched at the auth boundary, not in the route handler).

Auth test suite (new — *.e2e.full.test.ts)

162 e2e tests run against a real spawned webapp + Postgres (no mocks). Coverage matrix:

  • API key auth — read / write / trigger / batchTrigger / deploy actions across runs, batches, deployments, prompts, queues, query, sessions, input-streams, waitpoints, tasks, idempotency keys; multi-key resources (a run carries batch / tag / task identifiers — auth must accept any matching scope)
  • Personal Access Token auth — comprehensive matrix: scope match, scope mismatch, missing scope, expired token, malformed token
  • Public JWT auth — sub-vs-URL environment resolution, expired JWTs, signature verification, scope checking, otu (one-time-use) token semantics, branch-environment signing-key fallback
  • Dashboard session auth — admin-only pages reject non-admins; per-action gating
  • Cross-cutting edge cases — revoked API key grace window, JWT cross-environment isolation, MissingResource branch behaviour

Test plan

  • pnpm run typecheck --filter webapp clean
  • pnpm exec vitest run --config apps/webapp/vitest.e2e.full.config.ts — 162/162 pass
  • Spot-check an authed API endpoint with a valid + invalid API key against a local stack
  • Spot-check the migrated admin pages render and gate non-admins

@changeset-bot
Copy link
Copy Markdown

changeset-bot Bot commented May 1, 2026

🦋 Changeset detected

Latest commit: 9f987ae

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 31 packages
Name Type
@trigger.dev/core Patch
@trigger.dev/plugins Patch
@trigger.dev/build Patch
trigger.dev Patch
@trigger.dev/python Patch
@trigger.dev/redis-worker Patch
@trigger.dev/schema-to-json Patch
@trigger.dev/sdk Patch
@internal/cache Patch
@internal/clickhouse Patch
@internal/llm-model-catalog Patch
@trigger.dev/rbac Patch
@internal/redis Patch
@internal/replication Patch
@internal/run-engine Patch
@internal/schedule-engine Patch
@internal/testcontainers Patch
@internal/tracing Patch
@internal/tsql Patch
@internal/zod-worker Patch
d3-chat Patch
references-d3-openai-agents Patch
references-nextjs-realtime Patch
references-realtime-hooks-test Patch
references-realtime-streams Patch
references-telemetry Patch
@internal/sdk-compat-tests Patch
@trigger.dev/react-hooks Patch
@trigger.dev/rsc Patch
@trigger.dev/database Patch
@trigger.dev/otlp-importer Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 1, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: d6f9a342-cadc-4fb4-99d7-55428a16af28

📥 Commits

Reviewing files that changed from the base of the PR and between 2674417 and 9f987ae.

📒 Files selected for processing (3)
  • apps/webapp/app/routes/_app.orgs.$organizationSlug.settings.team/route.tsx
  • apps/webapp/app/routes/api.v1.runs.ts
  • apps/webapp/app/routes/realtime.v1.runs.ts
🚧 Files skipped from review as they are similar to previous changes (3)
  • apps/webapp/app/routes/realtime.v1.runs.ts
  • apps/webapp/app/routes/api.v1.runs.ts
  • apps/webapp/app/routes/_app.orgs.$organizationSlug.settings.team/route.tsx
📜 Recent review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (30)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (1, 8)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (4, 8)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (8, 8)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (6, 8)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (5, 8)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (7, 8)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (2, 8)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (3, 8)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (1, 8)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (8, 8)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (3, 8)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (6, 8)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (7, 8)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (4, 8)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (5, 8)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (2, 8)
  • GitHub Check: units / packages / 🧪 Unit Tests: Packages (1, 1)
  • GitHub Check: units / e2e-webapp / 🧪 E2E Tests: Webapp
  • GitHub Check: sdk-compat / Node.js 20.20 (ubuntu-latest)
  • GitHub Check: sdk-compat / Node.js 22.12 (ubuntu-latest)
  • GitHub Check: e2e / 🧪 CLI v3 tests (ubuntu-latest - npm)
  • GitHub Check: e2e / 🧪 CLI v3 tests (windows-latest - pnpm)
  • GitHub Check: e2e / 🧪 CLI v3 tests (windows-latest - npm)
  • GitHub Check: e2e / 🧪 CLI v3 tests (ubuntu-latest - pnpm)
  • GitHub Check: sdk-compat / Bun Runtime
  • GitHub Check: sdk-compat / Deno Runtime
  • GitHub Check: sdk-compat / Cloudflare Workers
  • GitHub Check: typecheck / typecheck
  • GitHub Check: 🛡️ E2E Auth Tests (full)
  • GitHub Check: Analyze (javascript-typescript)

Walkthrough

Adds RBAC plugin and contracts plus a lazy fallback, converts many route authorizations to typed {type,id}/anyResource/everyResource forms, centralizes API/PAT/session auth in updated route builders, introduces dashboardLoader/dashboardAction, wires RBAC into webapp services and UI (Roles page, team RolePicker, invite/token role fields), unifies AuthenticatedEnvironment types across packages, extracts scheduleEmail, adds DB migration for invites, and provides extensive Vitest e2e suites with shared test server/helpers and a CI workflow.

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch rbac-packages

coderabbitai[bot]

This comment was marked as resolved.

coderabbitai[bot]

This comment was marked as resolved.

coderabbitai[bot]

This comment was marked as resolved.

devin-ai-integration[bot]

This comment was marked as resolved.

coderabbitai[bot]

This comment was marked as resolved.

matt-aitken added a commit that referenced this pull request May 4, 2026
Five real-bug fixes from CodeRabbit + Devin review of #3499:

#1 personalAccessToken.server.ts: FALLBACK_NOT_INSTALLED_ERROR string
   was 'RBAC fallback not installed' but the OSS fallback actually
   returns 'RBAC plugin not installed'. The mismatch made every PAT
   creation with a roleId hit the compensating-delete branch on
   self-hosters with no plugin installed — including the dashboard
   PAT-creation flow. Aligns the constant with the canonical string.

#2 internal-packages/rbac/src/fallback.ts: authenticateBearer skipped
   the revoked-API-key grace window (RevokedApiKey table), so a
   freshly-rotated env API key would 401 immediately on the new auth
   path. Mirrors findEnvironmentByApiKey's fallback-to-RevokedApiKey
   logic so the auth-cross-cutting e2e tests pass.

#3 api.v1.query.ts: multi-table queries built a plain RbacResource
   array, which checkAuth treats as 'any element passes'. A JWT
   scoped to one detected table could submit a query that also reads
   another table it isn't scoped to. Wrap with everyResource — same
   AND-semantics fix as the batch trigger routes.

#4 account.tokens/route.tsx: defaultRoleId could land on a custom or
   plan-blocked role when userRoleId wasn't in the picker's assignable
   set. The action's submit-revalidation would then 400 until the user
   manually changed the dropdown. Clamp the default to roles the picker
   actually renders.

#5 settings.team/route.tsx: the role Select used defaultValue, so a
   failed set-role submit left the attempted role visible while the
   server kept the old one. Switch to a controlled value bound to
   currentRoleId.
devin-ai-integration[bot]

This comment was marked as resolved.

github-advanced-security[bot]

This comment was marked as resolved.

matt-aitken added a commit that referenced this pull request May 6, 2026
Five real-bug fixes from CodeRabbit + Devin review of #3499:

#1 personalAccessToken.server.ts: FALLBACK_NOT_INSTALLED_ERROR string
   was 'RBAC fallback not installed' but the OSS fallback actually
   returns 'RBAC plugin not installed'. The mismatch made every PAT
   creation with a roleId hit the compensating-delete branch on
   self-hosters with no plugin installed — including the dashboard
   PAT-creation flow. Aligns the constant with the canonical string.

#2 internal-packages/rbac/src/fallback.ts: authenticateBearer skipped
   the revoked-API-key grace window (RevokedApiKey table), so a
   freshly-rotated env API key would 401 immediately on the new auth
   path. Mirrors findEnvironmentByApiKey's fallback-to-RevokedApiKey
   logic so the auth-cross-cutting e2e tests pass.

#3 api.v1.query.ts: multi-table queries built a plain RbacResource
   array, which checkAuth treats as 'any element passes'. A JWT
   scoped to one detected table could submit a query that also reads
   another table it isn't scoped to. Wrap with everyResource — same
   AND-semantics fix as the batch trigger routes.

#4 account.tokens/route.tsx: defaultRoleId could land on a custom or
   plan-blocked role when userRoleId wasn't in the picker's assignable
   set. The action's submit-revalidation would then 400 until the user
   manually changed the dropdown. Clamp the default to roles the picker
   actually renders.

#5 settings.team/route.tsx: the role Select used defaultValue, so a
   failed set-role submit left the attempted role visible while the
   server kept the old one. Switch to a controlled value bound to
   currentRoleId.
@matt-aitken matt-aitken marked this pull request as ready for review May 6, 2026 15:49
Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 3 new potential issues.

View 11 additional findings in Devin Review.

Open in Devin Review

matt-aitken added a commit that referenced this pull request May 6, 2026
Five real-bug fixes from CodeRabbit + Devin review of #3499:

#1 personalAccessToken.server.ts: FALLBACK_NOT_INSTALLED_ERROR string
   was 'RBAC fallback not installed' but the OSS fallback actually
   returns 'RBAC plugin not installed'. The mismatch made every PAT
   creation with a roleId hit the compensating-delete branch on
   self-hosters with no plugin installed — including the dashboard
   PAT-creation flow. Aligns the constant with the canonical string.

#2 internal-packages/rbac/src/fallback.ts: authenticateBearer skipped
   the revoked-API-key grace window (RevokedApiKey table), so a
   freshly-rotated env API key would 401 immediately on the new auth
   path. Mirrors findEnvironmentByApiKey's fallback-to-RevokedApiKey
   logic so the auth-cross-cutting e2e tests pass.

#3 api.v1.query.ts: multi-table queries built a plain RbacResource
   array, which checkAuth treats as 'any element passes'. A JWT
   scoped to one detected table could submit a query that also reads
   another table it isn't scoped to. Wrap with everyResource — same
   AND-semantics fix as the batch trigger routes.

#4 account.tokens/route.tsx: defaultRoleId could land on a custom or
   plan-blocked role when userRoleId wasn't in the picker's assignable
   set. The action's submit-revalidation would then 400 until the user
   manually changed the dropdown. Clamp the default to roles the picker
   actually renders.

#5 settings.team/route.tsx: the role Select used defaultValue, so a
   failed set-role submit left the attempted role visible while the
   server kept the old one. Switch to a controlled value bound to
   currentRoleId.
devin-ai-integration[bot]

This comment was marked as resolved.

matt-aitken added a commit that referenced this pull request May 6, 2026
zizmor flagged on PR #3499:
- L72 actions/checkout@v4 unpinned → @de0fac2... v6.0.2 (matches the
  rest of the repo).
- L74 credential persistence → add `persist-credentials: false`. This
  job doesn't push, so leaving GITHUB_TOKEN in .git/config has no
  upside and a leak path through any subsequent step.
- L82 buildjet/setup-node@v4 unpinned → switched to actions/setup-node
  @48b55a01... v6.4.0 (already SHA-pinned and used by other workflows;
  this job runs on ubuntu-latest, no buildjet runner needed).
- L89 docker/login-action@v3 unpinned → @4907a6dd... v4.1.0.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
matt-aitken added a commit that referenced this pull request May 7, 2026
…orts fix

Four issues flagged on PR #3499:

1. (apps/webapp/app/models/runtimeEnvironment.server.ts) toAuthenticated()
   was documented as coercing Prisma's Decimal `concurrencyLimitBurstFactor`
   to a number but actually passed it through unchanged. Slim AE accepts
   `number | DecimalLike`, so it compiled, but at runtime any consumer
   doing arithmetic would get NaN. Now calls `.toNumber()` explicitly.

2. (internal-packages/rbac/src/fallback.ts) Same issue in the fallback's
   toAuthenticatedEnvironment — was a typed identity function. Now coerces
   the burst factor to number via the same union-narrowing pattern used
   by the EnvironmentQueuePresenter.

3 + 4. (internal-packages/rbac/src/fallback.ts, apps/webapp/app/services/rbac.server.ts)
   The fallback was using the primary PrismaClient for read-only auth-path
   queries (env lookup, revoked-key grace, session user lookup, PAT
   lookup). Pre-PR, findEnvironmentByApiKey used $replica for these.
   Shifting auth-path reads from replica to primary is a real perf
   regression on high-traffic deployments.

   RoleBaseAccessFallback now accepts either a single PrismaClient or
   `{ primary, replica }`. The webapp passes both — `prisma` for writes
   (role mutations) and `$replica` for the read-only authenticate*
   paths. The fallback's role-mutation methods are no-op stubs in the
   plugin-not-installed case, so writes only need the primary
   nominally; the type kept for symmetry.

5. (packages/core/package.json) Two new subpath exports added in this PR
   — `v3/auth/environment` and `v3/utils/gitBranch` — were missing from
   the `typesVersions` block. attw flagged "resolution failed" on both,
   failing the `check-exports` CI job. Added the entries.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@matt-aitken matt-aitken force-pushed the rbac-packages branch 3 times, most recently from 22345f6 to e9340b2 Compare May 8, 2026 16:23
matt-aitken added a commit that referenced this pull request May 8, 2026
Five real-bug fixes from CodeRabbit + Devin review of #3499:

#1 personalAccessToken.server.ts: FALLBACK_NOT_INSTALLED_ERROR string
   was 'RBAC fallback not installed' but the OSS fallback actually
   returns 'RBAC plugin not installed'. The mismatch made every PAT
   creation with a roleId hit the compensating-delete branch on
   self-hosters with no plugin installed — including the dashboard
   PAT-creation flow. Aligns the constant with the canonical string.

#2 internal-packages/rbac/src/fallback.ts: authenticateBearer skipped
   the revoked-API-key grace window (RevokedApiKey table), so a
   freshly-rotated env API key would 401 immediately on the new auth
   path. Mirrors findEnvironmentByApiKey's fallback-to-RevokedApiKey
   logic so the auth-cross-cutting e2e tests pass.

#3 api.v1.query.ts: multi-table queries built a plain RbacResource
   array, which checkAuth treats as 'any element passes'. A JWT
   scoped to one detected table could submit a query that also reads
   another table it isn't scoped to. Wrap with everyResource — same
   AND-semantics fix as the batch trigger routes.

#4 account.tokens/route.tsx: defaultRoleId could land on a custom or
   plan-blocked role when userRoleId wasn't in the picker's assignable
   set. The action's submit-revalidation would then 400 until the user
   manually changed the dropdown. Clamp the default to roles the picker
   actually renders.

#5 settings.team/route.tsx: the role Select used defaultValue, so a
   failed set-role submit left the attempted role visible while the
   server kept the old one. Switch to a controlled value bound to
   currentRoleId.
matt-aitken added a commit that referenced this pull request May 8, 2026
zizmor flagged on PR #3499:
- L72 actions/checkout@v4 unpinned → @de0fac2... v6.0.2 (matches the
  rest of the repo).
- L74 credential persistence → add `persist-credentials: false`. This
  job doesn't push, so leaving GITHUB_TOKEN in .git/config has no
  upside and a leak path through any subsequent step.
- L82 buildjet/setup-node@v4 unpinned → switched to actions/setup-node
  @48b55a01... v6.4.0 (already SHA-pinned and used by other workflows;
  this job runs on ubuntu-latest, no buildjet runner needed).
- L89 docker/login-action@v3 unpinned → @4907a6dd... v4.1.0.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
matt-aitken added a commit that referenced this pull request May 8, 2026
…orts fix

Four issues flagged on PR #3499:

1. (apps/webapp/app/models/runtimeEnvironment.server.ts) toAuthenticated()
   was documented as coercing Prisma's Decimal `concurrencyLimitBurstFactor`
   to a number but actually passed it through unchanged. Slim AE accepts
   `number | DecimalLike`, so it compiled, but at runtime any consumer
   doing arithmetic would get NaN. Now calls `.toNumber()` explicitly.

2. (internal-packages/rbac/src/fallback.ts) Same issue in the fallback's
   toAuthenticatedEnvironment — was a typed identity function. Now coerces
   the burst factor to number via the same union-narrowing pattern used
   by the EnvironmentQueuePresenter.

3 + 4. (internal-packages/rbac/src/fallback.ts, apps/webapp/app/services/rbac.server.ts)
   The fallback was using the primary PrismaClient for read-only auth-path
   queries (env lookup, revoked-key grace, session user lookup, PAT
   lookup). Pre-PR, findEnvironmentByApiKey used $replica for these.
   Shifting auth-path reads from replica to primary is a real perf
   regression on high-traffic deployments.

   RoleBaseAccessFallback now accepts either a single PrismaClient or
   `{ primary, replica }`. The webapp passes both — `prisma` for writes
   (role mutations) and `$replica` for the read-only authenticate*
   paths. The fallback's role-mutation methods are no-op stubs in the
   plugin-not-installed case, so writes only need the primary
   nominally; the type kept for symmetry.

5. (packages/core/package.json) Two new subpath exports added in this PR
   — `v3/auth/environment` and `v3/utils/gitBranch` — were missing from
   the `typesVersions` block. attw flagged "resolution failed" on both,
   failing the `check-exports` CI job. Added the entries.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
matt-aitken and others added 23 commits May 8, 2026 18:43
Adds a batch variant of getUserRole that returns a Map<userId, Role | null>
in one round-trip. Used by TeamPresenter to drop the N+1 it had been
documenting as a future optimisation. Org-scoped only — project-scoped
reads still go through getUserRole (less common, not worth complicating
the API for).
Adds an optional `group` field to Permission so the plugin labels and
orders permission sections rather than the OSS hardcoding a name → group
map. Section order on the Roles page now follows the order permissions
appear in `allPermissions()` — first group seen, first rendered. Drops
the previous PERMISSION_GROUP_BY_NAME / GROUP_ORDER constants from the
webapp.
…mantics

The 6 session routes merged via PR #3417 were authored against the
pre-RBAC apiBuilder API: `authorization.resource` returned shapes like
`{ sessions: 'abc' }`, with a parallel `superScopes: [...]` whitelist
for broad-scope bypass. Post-TRI-8719, that shape doesn't typecheck and
`superScopes` is dead code.

Convert each resource callback to the canonical `{ type, id? }` shape.
For the two routes whose resource type is `tasks` but whose old
superScopes included `<action>:sessions` (list and create), use a
multi-key array `[{ type: 'tasks', id }, { type: 'sessions' }]` so a
JWT scoped `<action>:sessions` (no id) still passes — preserving the
exact allow-set the old superScopes mechanism granted. `*:all` and
`admin*` were already handled by the JWT ability's wildcard branches.

Drop the now-dead `superScopes` field from all 9 entries.

Adds e2e coverage in `auth-api.e2e.full.test.ts` (34 new tests, ~9
sub-describes) that locks in: per-task narrowing, `<action>:sessions`
type-only equivalence to the old superScope, `*:all` and `admin*`
bypass, wrong-action / wrong-id rejection. Plus a new
`seedTestApiSession` helper for inserting Session rows via Prisma —
distinct from the existing `seedTestSession` (cookie-session helper
for dashboard tests).
Adds RoleBaseAccessController.authenticatePat — PATs identify the user;
the effective ability is min(user's role in target org, max-role cap).
The user's actual membership is the floor (auto-narrows on demotion or
removal); the cap is set at PAT creation as a deliberate ceiling.

OSS-side:
- @trigger.dev/plugins gains the PatAuthResult type + authenticatePat
  on the controller interface.
- Fallback validates the PAT (prefix, hashed lookup, revoked check) and
  returns a permissive ability — preserves the pre-RBAC behaviour where
  PATs were pure user-identity tokens. Self-hosters see no change.
- LazyController delegates with the existing withActionAliases wrapper.

apiBuilder:
- createLoaderPATApiRoute accepts an optional context callback to
  derive { organizationId?, projectId? } and an optional authorization
  block. When either is declared, rbac.authenticatePat runs and the
  ability flows into the handler. Routes that don't opt in stay on the
  legacy permissive path.
- api.v1.projects.$projectRef.runs.ts opts in: context resolves
  projectRef -> organizationId, authorization is read on type runs.

UI:
- account.tokens picker reframed as 'Maximum role' with a hint
  explaining the cap-vs-floor model. Underlying TokenRole storage
  unchanged; semantic flip from 'bound role' to 'max role cap'.

The role chosen at PAT creation now actually constrains the token
(previously TokenRole was written but never read at request time).
The workflow pinned pnpm 10.23.0; root package.json declares 10.33.2.
pnpm/action-setup v4+ now rejects the conflict (root + workflow are
both 'specified versions' as far as the action is concerned), failing
the job before it can run any tests. Bump to v5.0.0 of the action and
match the 10.33.2 pin already used by unit-tests-webapp.yml and
release.yml.
The webapp unit-test config excluded *.e2e.test.ts (smoke matrix) but
not *.e2e.full.test.ts. The full auth suite needs a globalSetup that
spawns a webapp + Postgres container, which only the dedicated
vitest.e2e.full.config.ts provides. CI's unit-test shards were picking
up the e2e-full files via the include glob and failing immediately
with 'globalSetup didn't provide baseUrl/databaseUrl'.
Five real-bug fixes from CodeRabbit + Devin review of #3499:

#1 personalAccessToken.server.ts: FALLBACK_NOT_INSTALLED_ERROR string
   was 'RBAC fallback not installed' but the OSS fallback actually
   returns 'RBAC plugin not installed'. The mismatch made every PAT
   creation with a roleId hit the compensating-delete branch on
   self-hosters with no plugin installed — including the dashboard
   PAT-creation flow. Aligns the constant with the canonical string.

#2 internal-packages/rbac/src/fallback.ts: authenticateBearer skipped
   the revoked-API-key grace window (RevokedApiKey table), so a
   freshly-rotated env API key would 401 immediately on the new auth
   path. Mirrors findEnvironmentByApiKey's fallback-to-RevokedApiKey
   logic so the auth-cross-cutting e2e tests pass.

#3 api.v1.query.ts: multi-table queries built a plain RbacResource
   array, which checkAuth treats as 'any element passes'. A JWT
   scoped to one detected table could submit a query that also reads
   another table it isn't scoped to. Wrap with everyResource — same
   AND-semantics fix as the batch trigger routes.

#4 account.tokens/route.tsx: defaultRoleId could land on a custom or
   plan-blocked role when userRoleId wasn't in the picker's assignable
   set. The action's submit-revalidation would then 400 until the user
   manually changed the dropdown. Clamp the default to roles the picker
   actually renders.

#5 settings.team/route.tsx: the role Select used defaultValue, so a
   failed set-role submit left the attempted role visible while the
   server kept the old one. Switch to a controlled value bound to
   currentRoleId.
Locks in the everyResource(...) wrap added in 5547e66. Two new tests:
JWT scoped to one of multiple detected tables → 403; JWT scoped to all
detected tables → auth passes. Mirrors the every-task lock-in for the
batch trigger routes.
…-resource sites

Bare RbacResource[] in `authorization.resource` is now a type error.
Multi-resource auth must wrap with one of:

  - anyResource(...): succeed if any element passes (the existing default;
    used when one record carries multiple identifiers — runs by friendlyId
    / batch / tags / task — so a JWT scoped to any one grants access)
  - everyResource(...): succeed only if every element passes (existing
    helper; used by batch operations and the multi-table query route)

The OR-loophole class of bug CodeRabbit caught on api.v1.query — a JWT
scoped to one of N detected tables was authorized for the whole multi-
table query — was patchable per-route with everyResource. The Symbol
marker stayed invisible: future authors would still default to bare
arrays. Tightening AuthResource flushed out 11 routes that were silently
on the OR path; each is wrapped explicitly now.

Also inline the unrelated private `anyResource` helper in
internal-packages/rbac/src/ability.ts so the public name is unambiguous.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
#1 internal-packages/rbac/src/ability.ts (severity: 🔴 silent privilege
   escalation): buildJwtAbility was treating any scope starting with
   `admin:` as a universal wildcard. Pre-RBAC, the legacy
   checkAuthorization string-matched superScopes — `admin:sessions` only
   granted access to routes that explicitly listed it. After the JWT-
   ability split, the same scope was returning true for every action on
   every resource. Restrict the bypass to bare `admin` (no second
   segment); `admin:<type>` now flows through normal matching as
   action="admin" against resources of that type. Adds 2 regression
   tests in ability.test.ts.

#2 apps/webapp/app/services/routeBuilders/apiBuilder.server.ts (status
   discard): authenticateRequestForApiBuilder hardcoded `status: 401`
   even though BearerAuthResult.status is `401 | 403`. A plugin
   returning 403 (e.g. suspended account, IP block) would silently get
   downgraded to 401 — semantically wrong (401 = "who are you?", 403 =
   "you're not allowed") and confusing for client retry logic. Plumb
   result.status through.

#3 apps/webapp/app/services/routeBuilders/apiBuilder.server.ts
   (everyResource([]) vacuous truth): [].every() returns true, so
   everyResource([]) was passing auth for any token. Not exploitable
   today (Zod rejects empty bodies before auth), but the auth layer
   should never grant on empty input. Same defensive guard added to
   anyResource() for symmetry — only PERMISSIVE_ABILITY would have
   granted there, but the pattern shouldn't depend on the ability's
   choice.

#4 internal-packages/rbac/src/fallback.ts (PREVIEW env regression): the
   fallback's authenticateBearer looked up environments by apiKey only,
   skipping the branch-aware resolution that findEnvironmentByApiKey
   does for PREVIEW envs. Self-hosters using preview/branch envs would
   either fail or operate against the parent env. Mirror the legacy
   path: read x-trigger-branch, include matching child env, and pivot
   the resolved env to the child (apiKey/orgMember/organization/project
   inherited from parent). sanitizeBranchName inlined here because
   internal-packages can't import webapp code; comment notes the
   duplication.

All four flagged by Devin's PR review. Cloud plugin's buildJwtAbility
gets the same #1 fix in a sibling commit on this PR's cloud branch.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…core/v3/utils/gitBranch

Both helpers were originally in apps/webapp/app/v3/gitBranch.ts.
internal-packages/rbac needed sanitizeBranchName for the PREVIEW-env
branch resolution added in 8246234, and copy-pasting the function into
the fallback was a smell. Moves the canonical home into core (no
internal-package can import webapp code), and updates the four webapp
call sites + the rbac fallback to import from
@trigger.dev/core/v3/utils/gitBranch.

`sanitizeBranchName`'s input type is widened slightly to
`string | null | undefined` so callers passing `Headers.get(...)` (which
returns `string | null`) don't need a `?? undefined` workaround. The
existing webapp callers all pass `string | undefined`; the new union is
backwards-compatible.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…minate redundant findEnvironmentById

Plugins now return the full env shape they fetched. The apiBuilder bridge passes it straight through to handlers — no follow-up findEnvironmentById round-trip.

@trigger.dev/core/v3/auth/environment (new) defines the slim AuthenticatedEnvironment structurally, independent of @trigger.dev/database. @trigger.dev/plugins re-exports it plus sanitizeBranchName so cloud's workspace-linked plugin imports from one surface.

Internal-packages/rbac/src/fallback.ts drops toRbacEnvironment stripping. apiBuilder.server.ts bridge drops findEnvironmentById. Model finders coerce to slim shape. Various downstream signature updates accept slim AE.

Net: single DB call per API request.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
zizmor flagged on PR #3499:
- L72 actions/checkout@v4 unpinned → @de0fac2... v6.0.2 (matches the
  rest of the repo).
- L74 credential persistence → add `persist-credentials: false`. This
  job doesn't push, so leaving GITHUB_TOKEN in .git/config has no
  upside and a leak path through any subsequent step.
- L82 buildjet/setup-node@v4 unpinned → switched to actions/setup-node
  @48b55a01... v6.4.0 (already SHA-pinned and used by other workflows;
  this job runs on ubuntu-latest, no buildjet runner needed).
- L89 docker/login-action@v3 unpinned → @4907a6dd... v4.1.0.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…orts fix

Four issues flagged on PR #3499:

1. (apps/webapp/app/models/runtimeEnvironment.server.ts) toAuthenticated()
   was documented as coercing Prisma's Decimal `concurrencyLimitBurstFactor`
   to a number but actually passed it through unchanged. Slim AE accepts
   `number | DecimalLike`, so it compiled, but at runtime any consumer
   doing arithmetic would get NaN. Now calls `.toNumber()` explicitly.

2. (internal-packages/rbac/src/fallback.ts) Same issue in the fallback's
   toAuthenticatedEnvironment — was a typed identity function. Now coerces
   the burst factor to number via the same union-narrowing pattern used
   by the EnvironmentQueuePresenter.

3 + 4. (internal-packages/rbac/src/fallback.ts, apps/webapp/app/services/rbac.server.ts)
   The fallback was using the primary PrismaClient for read-only auth-path
   queries (env lookup, revoked-key grace, session user lookup, PAT
   lookup). Pre-PR, findEnvironmentByApiKey used $replica for these.
   Shifting auth-path reads from replica to primary is a real perf
   regression on high-traffic deployments.

   RoleBaseAccessFallback now accepts either a single PrismaClient or
   `{ primary, replica }`. The webapp passes both — `prisma` for writes
   (role mutations) and `$replica` for the read-only authenticate*
   paths. The fallback's role-mutation methods are no-op stubs in the
   plugin-not-installed case, so writes only need the primary
   nominally; the type kept for symmetry.

5. (packages/core/package.json) Two new subpath exports added in this PR
   — `v3/auth/environment` and `v3/utils/gitBranch` — were missing from
   the `typesVersions` block. attw flagged "resolution failed" on both,
   failing the `check-exports` CI job. Added the entries.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Webapp's vitest config sets `pool: "forks"` and the test script uses
`--no-file-parallelism`, so every test file in a shard executes
sequentially in the same forked Node process. globalThis persists
across files even though vitest clears the module cache between them.

Two places call `provider.register()`:
- `~/v3/tracer.server.ts` via the `setupTelemetry` singleton — sets the
  OTel global APIs (trace/context/propagation).
- `test/utils/tracing.ts#createInMemoryTracing()` — also calls
  `provider.register()` to make `trace.getTracer("test-tracer")`
  return its in-memory provider's tracer.

Once the webapp's tracer.server.ts has been loaded by any test in the
shard, the globals are set. The next test that calls
createInMemoryTracing fails with `@opentelemetry/api: Attempted
duplicate registration of API: trace/context/propagation`. Failed test
stack pointed at `runsBackfiller.test.ts:30` calling
`createInMemoryTracing()` after triggerTask.test.ts (which loads
tracer.server.ts via `~/runEngine/services/triggerTask.server`) ran
first in shard 6.

The same shape failure cascades through the rest of the shard's tests
because the consumer keeps trying and tracer.server.ts can't be
re-singleton'd cleanly after the cache eviction.

Why now: this PR's slim-AuthenticatedEnvironment refactor changed the
import graph so that triggerTask test's transitive imports actually
reach tracer.server.ts in CI's shard 6 distribution. It worked before
because the shard's tests didn't co-load both register() callers. But
the latent issue was always there — fixing the test util is the right
layer.

Fix: drop `provider.register()` from createInMemoryTracing. The
exporter still receives spans because the consumer uses the tracer
returned from `provider.getTracer(...)` directly (provider-scoped, not
global). Tests that genuinely need the global must register their own
provider explicitly and clean up — but none of the current callers
do.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…t") that collides with OTel auto-instrumentation

The previous fix in this PR (lazy-load `expect` inside assertNonNullable
via require("vitest")) was intended to keep the module importable from
contexts where vitest isn't initialized at module-load time (a vitest
globalSetup, where the worker doesn't exist yet).

But require() inside the function body collides with OTel's
`@opentelemetry/instrumentation`, which uses `require-in-the-middle`
to hook every Node `require()` call. vitest is ESM-only, so once OTel's
been touched in the same process, the next `require("vitest")` call
throws "Vitest cannot be imported in a CommonJS module using require()".

That instrumentation runs as soon as the run-engine — or any code that
loads its OTel-traced internals — is imported. That's:
- Every run-engine internal test (run-queue, heartbeats, pendingVersion)
  uses assertNonNullable.
- Every webapp test that transitively reaches RunEngine (triggerTask,
  runsBackfiller, plus any test that imports services using engine
  internals) does too.

Each shard ran ~7/8 failing because some early test loaded run-engine,
require-in-the-middle armed, then the next assertNonNullable call
exploded — cascading the rest of the shard's tests via vitest's
fail-fast on uncaught errors in the same process.

Fix: replace the vitest.expect call with a plain throw. Vitest still
gets a useful failure (the message shows in the stack trace) without
the require() hazard. The test that PR #3438 originally needed this
for (a globalSetup that imported assertNonNullable before workers
existed) still works — there's no top-level vitest import any more.

Verified: triggerTask.test.ts (which previously failed locally with
the require-in-the-middle error) now passes 8/8.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…okie helper

Background. The RBAC plugin contract took a `helpers.getSessionUserId(request)`
callback at create time. The OSS rbac.server module wired it up by statically
importing `getUserId` from `~/services/session.server`, which transitively
loaded the entire remix-auth pipeline:

  rbac.server.ts
    → session.server.ts → auth.server.ts
    → emailAuth.server.tsx (throws on missing MAGIC_LINK_SECRET at module load)
    → email.server.ts → commonWorker.server.ts (V1 services)
    → marqs/index.server.ts and taskRunConcurrencyTracker.server.ts
      (throw on missing REDIS_HOST/REDIS_PORT at module load)

Any module that pulled `rbac` in — including PAT-only call sites that have
no session-cookie path at all — therefore inherited a hard dependency on
the entire dashboard auth chain and the V1 queue. Webapp tests that mock
`~/env.server` to a stripped object (e.g. personalAccessToken.test.ts) hit
the module-load throws before their assertions ran.

Contract change (packages/plugins/src/rbac.ts).
- `authenticateSession` and `authenticateAuthorizeSession` now take
  `userId: string | null` in their `context` argument.
- `RoleBasedAccessControlPlugin.create()` no longer takes a helpers arg.
- The plugin treats `userId === null` as "no authenticated user"
  (same shape `helpers.getSessionUserId === null` returned).

OSS fallback (internal-packages/rbac/src/fallback.ts) reads
`context.userId` directly. LazyController (internal-packages/rbac/src/index.ts)
drops the helpers parameter through.

Webapp (apps/webapp/app/services/rbac.server.ts) drops the session.server
import entirely. dashboardBuilder.server.ts — the only `authenticateSession`
caller — resolves userId via session.server itself before calling rbac.

Side change (apps/webapp/app/services/scheduleEmail.server.ts new file).
Moved `scheduleEmail` out of email.server.ts to break a parallel chain:
email.server.ts → commonWorker → V1 services → marqs. email.server is
now just the SMTP/Resend client, as it should have been. Three caller
files updated: routes/invite-resend.tsx, routes/_app.orgs.$organizationSlug.invite/route.tsx,
services/mfa/multiFactorAuthentication.server.ts.

Verified locally with `REDIS_HOST="" REDIS_PORT="" pnpm vitest run
test/services/personalAccessToken.test.ts test/engine/triggerTask.test.ts`
— both files pass (11/11) with the env mocked stripped, which was the
CI failure mode on shards 1/2/4/5/6/7. Webapp + plugins + rbac
typecheck clean.

Cloud-repo plugin update is in a coordinated branch (rbac-packages on
APIHero/cloud) — same contract change in
enterprise/plugins/src/rbac/{index,controller,controller.integration.test}.ts.

Changeset: minor for `@trigger.dev/plugins` (contract change).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…n createInMemoryTracing

The previous form (commit 133d468) skipped `provider.register()` to
avoid "Attempted duplicate registration of API: trace" when another
test in the same shard had already loaded `~/v3/tracer.server.ts`
(which calls register() via its `singleton("opentelemetry", …)`).
That fix went too far: tests like `sentryTraceContext.server.test.ts`
exercise OTel's *global* API (`trace.getActiveSpan()` via
`getActiveTraceIds`), which requires a registered global context
manager. Without register(), `trace.getActiveSpan()` returned
undefined — 6/9 assertions in that file flipped red on shard 1 of
the latest CI run.

Webapp's vitest config uses `pool: "forks"` with
`--no-file-parallelism`, so all test files in a shard share one
process. globalThis persists across files even though vitest clears
the module cache between them. OTel's `setGlobal*` no-ops (logs an
error, returns false) when a global is already set — so the second
caller's register() is a no-op, leaving the globals pointed at the
first caller's provider.

Fix: call `trace.disable(); context.disable(); propagation.disable();`
*before* `provider.register()`. The disables clear any previously-set
globals (whether from `tracer.server.ts`'s singleton or a prior
createInMemoryTracing call), so the test's provider always wins for
both span recording (via SimpleSpanProcessor) AND the global API.
Each createInMemoryTracing call rotates the globals; no leakage to
the next test file.

Verified locally: `pnpm vitest run test/sentryTraceContext.server.test.ts`
9/9 pass. Running it alongside test/engine/triggerTask.test.ts (which
loads tracer.server.ts) reaches the test-setup phase normally —
no more module-load OTel collision.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
coderabbitai[bot]

This comment was marked as resolved.

…ticated mapper

CodeRabbit flagged: after switching the API-key branch to return the
slim AuthenticatedEnvironment shape (via runtimeEnvironment.server.ts's
`toAuthenticated()` + `authIncludeBase`), the PAT and OAT branches in
`authenticatedEnvironmentForAuthentication()` still ran their own
`runtimeEnvironment.findFirst` calls with a smaller include
(`{ project: true, organization: true }`) and returned the raw Prisma
row. The function's return type narrowed structurally, but the runtime
shape was auth-method-dependent — `concurrencyLimitBurstFactor` came
back as Prisma's `Decimal` for PAT/OAT but `number` for API-key, and
PAT/OAT lacked `orgMember` entirely while API-key had it.

Fix: export the shared `authIncludeBase` / `authIncludeWithParent`
include shapes and `toAuthenticated()` from runtimeEnvironment.server,
then use them in the PAT/OAT branches. PREVIEW envs still override
apiKey with the parent's apiKey before mapping. All four branches now
return the same normalized slim shape — Decimal coerced to number,
columns the slim type doesn't carry stripped, orgMember populated
consistently.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 2 new potential issues.

View 21 additional findings in Devin Review.

Open in Devin Review

Comment thread apps/webapp/app/routes/api.v1.runs.ts
Comment on lines +146 to +157
export const action = dashboardAction(
{
params: Params,
context: async (params) => {
const orgId = await resolveOrgIdFromSlug(params.organizationSlug);
return orgId ? { organizationId: orgId } : {};
},
// No top-level authorization — different intents have different
// requirements (set-role needs manage:members; remove/leave is
// gated by the existing model layer; purchase-seats by the
// SetSeatsAddOnService). Per-intent ability checks happen inside.
},
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚩 Team page action route has no top-level authorization — different intents checked individually inside

The _app.orgs.$organizationSlug.settings.team action at line 147-157 uses dashboardAction with no authorization option. The comment explains that different form intents (set-role, purchase-seats, remove member) have different authorization requirements. The set-role intent checks ability.can("manage", { type: "members" }) at line 167. The purchase-seats and remove-member intents rely on existing model-layer checks. This means ANY authenticated user can reach the action handler — intent-level gating is the only protection. This is noted as intentional in the comments, but it means a user who can't manage members could still submit purchase-seats forms without an ability check at this layer (relying on the SetSeatsAddOnService for enforcement).

Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

…hten team action authz

Two related findings reviewing the RBAC route migration.

1. api.v1.runs / realtime.v1.runs: type-level scope no longer granted unfiltered access
   Pre-RBAC the resource was a plain object (`{ tasks: searchParams[...] }`
   for api.v1.runs, the searchParams object itself for realtime.v1.runs)
   and the legacy `checkAuthorization` iterated `Object.keys`. So a JWT
   with type-level `read:tasks` (api.v1.runs) or `read:tags`
   (realtime.v1.runs) — no id — granted access to the unfiltered list.
   The new `anyResource([…])` model only matches resources we list,
   and the post-migration list dropped the type-level alternative when
   no filter was set, so those JWTs started 403'ing on unfiltered
   listings. Add `{ type: "tasks" }` and `{ type: "tags" }` (no id)
   alongside `{ type: "runs" }` so the legacy semantic is preserved
   verbatim — per-id `read:tasks:foo` / `read:tags:bar` still grants
   only when the filter contains the matching id.

2. team action: explicit per-intent authz gate
   The team route's action handler accepts three intents (set-role,
   purchase-seats, remove-member). set-role already gates on
   `manage:members`, but purchase-seats and remove-member relied on
   model-layer enforcement (SetSeatsAddOnService, removeTeamMember).
   The loader gates on `read:members` (broader audience than
   pre-RBAC's owner/admin-only loader), so any user reaching the page
   could submit those forms — banking on defense in depth at the
   model layer.

   Fix:
   - purchase-seats → require `manage:billing`
   - remove-member → require `manage:members`, with self-leave
     (target.userId === actor.userId) carved out as always allowed
   - removeTeamMember model fn previously only verified the actor was
     a member of the org — any org member could remove any other
     member by id. The new route gate closes that hole.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 1 new potential issue.

View 25 additional findings in Devin Review.

Open in Devin Review

// `internal-packages/rbac/src/fallback.ts`'s `setTokenRole`). The
// match is how we detect "no plugin installed" and skip the
// compensating delete.
const FALLBACK_NOT_INSTALLED_ERROR = "RBAC plugin not installed";
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚩 FALLBACK_NOT_INSTALLED_ERROR string coupling between personalAccessToken.server.ts and fallback.ts

At apps/webapp/app/services/personalAccessToken.server.ts:29, the constant FALLBACK_NOT_INSTALLED_ERROR = "RBAC plugin not installed" must exactly match internal-packages/rbac/src/fallback.ts:341's setTokenRole return { ok: false, error: "RBAC plugin not installed" }. This string-coupling determines whether a failed setTokenRole triggers a compensating PAT delete (line 399) or is treated as benign (line 393). If the fallback's error string ever drifts, PAT creation would fail with an exception even when no RBAC plugin is installed. Consider exporting the sentinel string from the rbac package to eliminate the coupling.

Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants