When a snapshot is categorized as INDIRECT_NON_BREAKING, dev environment runs (sqlmesh run dev) repeatedly backfill the same intervals on every invocation. Completed work is never recognized.
The root cause is an inconsistency in how three subsystems treat INDIRECT_NON_BREAKING snapshots. These snapshots are is_representative=True but is_deployable=False, and different parts of the codebase use different predicates:
| Decision |
Predicate used |
Result |
Effect |
| Which table does the dev view point to? |
is_representative |
points to prod table i.e sqlmesh__<schema>.<model>__<version> |
View reads from prod table |
| Which intervals are checked for missing work? |
is_representative |
intervals (prod ledger) |
Missing work computed against prod |
| Where is the evaluated data written? |
is_deployable |
sqlmesh__<schema>.<model>__<dev_version>__dev (dev table, __dev suffix) |
Data written to dev table |
| Where is completed work recorded? |
is_deployable |
dev_intervals (dev ledger) |
Completions written to dev ledger |
Because missing-interval detection reads intervals but successful evaluation writes to dev_intervals, completed work is never visible to the next run.
Reproduction
- Have a model
B that depends on model A.
- Make a non-breaking change to
A and apply a plan. B gets categorized as INDIRECT_NON_BREAKING.
- Both
prod and dev environments now point to the same snapshot of B.
- Run
sqlmesh run dev --ignore-cron. It backfills some intervals for B.
- Run
sqlmesh run dev --ignore-cron again immediately. It backfills the same intervals.
- Repeat indefinitely -- same result every time.
Code references
INDIRECT_NON_BREAKING is unconditionally non-deployable - DeployabilityIndex.create() in sqlmesh/core/snapshot/definition.py:
if (
snapshot.is_forward_only
or snapshot.is_indirect_non_breaking # <-- unconditional
or is_forward_only_model
or has_auto_restatement
or not is_valid_start
):
this_deployable = False
But it is marked as representative - same method, immediately below:
if not snapshot.is_paused or (
snapshot.is_indirect_non_breaking and snapshot.intervals
):
representative_shared_version_ids.add(node)
Missing-interval detection uses is_representative - Snapshot.missing_intervals() in sqlmesh/core/snapshot/definition.py:
intervals = (
self.intervals if deployability_index.is_representative(self) else self.dev_intervals
)
Interval recording uses is_deployable - PlanEvaluator in sqlmesh/core/plan/evaluator.py:
is_deployable = deployability_index.is_deployable(snapshot)
# ...
intervals=intervals if is_deployable else [],
dev_intervals=intervals if not is_deployable else [],
View promotion uses is_representative - SnapshotEvaluator._promote_snapshot() in sqlmesh/core/snapshot/evaluator.py:
table_name = snapshot.table_name(deployability_index.is_representative(snapshot))
Impact
- Dev runs are never idempotent for affected models. Every run re-processes the same intervals.
- The dev table (
__dev suffix) receives data that no view ever points to. The work is wasted.
- CI pipelines using
sqlmesh run dev will always perform full backfills regardless of prior runs.
- This is not version-specific; the same code pattern exists across multiple SQLMesh releases.
Expected behavior
For a given snapshot, these three decisions should be consistent:
- The table the dev view points to,
- The interval ledger checked for missing work, and
- The interval ledger updated after successful work
should all refer to the same physical table (either prod or dev), not a mix of both.
Environment
- SQLMesh version: 0.234.1 (also confirmed in 0.231.1 source)
- Engine: DuckDB (but the issue is engine-independent; it's in the core snapshot/plan logic)
Note: AI was used to both triage this bug and to document the issue. However, the code paths and conclusions were manually inspected and validated.
When a snapshot is categorized as
INDIRECT_NON_BREAKING, dev environment runs (sqlmesh run dev) repeatedly backfill the same intervals on every invocation. Completed work is never recognized.The root cause is an inconsistency in how three subsystems treat
INDIRECT_NON_BREAKINGsnapshots. These snapshots areis_representative=Truebutis_deployable=False, and different parts of the codebase use different predicates:is_representativesqlmesh__<schema>.<model>__<version>is_representativeintervals(prod ledger)is_deployablesqlmesh__<schema>.<model>__<dev_version>__dev(dev table,__devsuffix)is_deployabledev_intervals(dev ledger)Because missing-interval detection reads
intervalsbut successful evaluation writes todev_intervals, completed work is never visible to the next run.Reproduction
Bthat depends on modelA.Aand apply a plan.Bgets categorized asINDIRECT_NON_BREAKING.prodanddevenvironments now point to the same snapshot ofB.sqlmesh run dev --ignore-cron. It backfills some intervals forB.sqlmesh run dev --ignore-cronagain immediately. It backfills the same intervals.Code references
INDIRECT_NON_BREAKINGis unconditionally non-deployable -DeployabilityIndex.create()insqlmesh/core/snapshot/definition.py:But it is marked as representative - same method, immediately below:
Missing-interval detection uses
is_representative-Snapshot.missing_intervals()insqlmesh/core/snapshot/definition.py:Interval recording uses
is_deployable-PlanEvaluatorinsqlmesh/core/plan/evaluator.py:View promotion uses
is_representative-SnapshotEvaluator._promote_snapshot()insqlmesh/core/snapshot/evaluator.py:Impact
__devsuffix) receives data that no view ever points to. The work is wasted.sqlmesh run devwill always perform full backfills regardless of prior runs.Expected behavior
For a given snapshot, these three decisions should be consistent:
should all refer to the same physical table (either prod or dev), not a mix of both.
Environment
Note: AI was used to both triage this bug and to document the issue. However, the code paths and conclusions were manually inspected and validated.