Skip to content

fix(controller): skip DB migration when DB schema is ahead of binary#1963

Open
onematchfox wants to merge 1 commit into
kagent-dev:mainfrom
onematchfox:fix/migrations-compat-mode
Open

fix(controller): skip DB migration when DB schema is ahead of binary#1963
onematchfox wants to merge 1 commit into
kagent-dev:mainfrom
onematchfox:fix/migrations-compat-mode

Conversation

@onematchfox
Copy link
Copy Markdown
Contributor

When an older binary starts against a database that a newer binary has already migrated, calling mg.Up() fails because golang-migrate tries to read the down file for the current DB version, which does not exist in the older binary's embedded FS. This results in the controller crash looping (issue #1881).

Instead, detect when the database version exceeds the binary's highest known migration and return early. This relies on the expand-then-contract policy documented in database-migrations.md, which guarantees each release's code is compatible with the schema applied by the previous release.

Adds TestApplyDir_SucceedsWhenDBVersionAhead to cover this path.

Copilot AI review requested due to automatic review settings June 4, 2026 14:55
@github-actions github-actions Bot added the bug Something isn't working label Jun 4, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

This PR makes the migration runner tolerant of running an older binary against a database schema migrated by a newer binary, avoiding crash-loops by skipping migration application when the DB version is ahead of the binary’s embedded migrations.

Changes:

  • Add a “compatibility mode” early-return in applyDir when DB schema version exceeds the binary’s max embedded migration version.
  • Introduce maxEmbeddedVersion to compute the highest embedded migration version from the filesystem.
  • Add a regression test ensuring applyDir succeeds (and does not modify schema version) when the DB version is ahead.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

File Description
go/core/pkg/migrations/runner.go Adds compatibility-mode logic and a helper to scan embedded migrations for the max version.
go/core/pkg/migrations/runner_test.go Adds a test proving older binaries don’t error or roll back when DB schema is ahead.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread go/core/pkg/migrations/runner.go
Comment thread go/core/pkg/migrations/runner.go
Comment thread go/core/pkg/migrations/runner.go Outdated
Comment thread go/core/pkg/migrations/runner_test.go
@onematchfox onematchfox force-pushed the fix/migrations-compat-mode branch from 40aad83 to 9382b86 Compare June 4, 2026 14:58
@onematchfox onematchfox requested a review from Copilot June 4, 2026 14:59
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

Comment thread go/core/pkg/migrations/runner.go
Comment thread go/core/pkg/migrations/runner.go
Comment thread go/core/pkg/migrations/runner_test.go
@onematchfox onematchfox force-pushed the fix/migrations-compat-mode branch from 9382b86 to f39b229 Compare June 4, 2026 15:06
@onematchfox onematchfox requested a review from Copilot June 4, 2026 15:06
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

Comment thread go/core/pkg/migrations/runner.go
Comment thread go/core/pkg/migrations/runner.go
Comment thread go/core/pkg/migrations/runner_test.go
When an older binary starts against a database that a newer binary has
already migrated, calling mg.Up() fails because golang-migrate tries to
read the down file for the current DB version, which does not exist in
the older binary's embedded FS. This caused a crash loop (issue kagent-dev#1881).

Instead, detect when the database version exceeds the binary's highest
known migration and return early. This relies on the expand-then-contract
policy documented in database-migrations.md, which guarantees each
release's code is compatible with the schema applied by the previous
release.

Adds TestApplyDir_SucceedsWhenDBVersionAhead to cover this path.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: Brian Fox <878612+onematchfox@users.noreply.github.com>
@onematchfox onematchfox force-pushed the fix/migrations-compat-mode branch from f39b229 to aa49e3e Compare June 5, 2026 10:11
Comment thread go/core/pkg/migrations/runner.go
iplay88keys
iplay88keys previously approved these changes Jun 5, 2026
Copy link
Copy Markdown
Contributor

@iplay88keys iplay88keys left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be changed when we update the db migration/rollback logic, but this is a good incremental improvement in the meantime.

@iplay88keys iplay88keys self-requested a review June 5, 2026 21:06
@iplay88keys iplay88keys dismissed their stale review June 5, 2026 21:06

Thinking more about this so we don't introduce a footgun here.

@iplay88keys
Copy link
Copy Markdown
Contributor

Sorry for un-approving this PR. I want to make sure the implications of this change are well thought through.

Let's start with aligning on a few things:

  • We only support n+1 upgrades and n-1 downgrades.
  • We also don't currently have a guarantee that a downgrade doesn't require manual steps, but we try to avoid them on upgrades.

The current db migrations are set up such that the application fails to start if there are more migrations applied to the db than the binary knows about. This is by design to prevent data corruption.

There is no way for the binary to know what migrations are in n+1 vs n+2 without going back and editing the old artifacts or making this incompatible with airgapped environments.

This PR removes the aforementioned check (comparing known migrations vs what's migrated), effectively introducing a requirement that Kagent stays compatible with future database schema shape changes and queries. It also leaves a subtle footgun as the information regarding the "compatibility mode" is only surfaced in a log line.

Even with this change, there is still a manual database rollback requirement in order to not be in an inconsistent state. So I'm not sure that it's worth taking on these additional support burdens in that case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants