[core][rest] Add branch merge support for append-only tables#7882
[core][rest] Add branch merge support for append-only tables#7882JunRuiLee wants to merge 3 commits into
Conversation
Add mergeBranch(sourceBranch, targetBranch) to Table and Catalog interfaces, with default/delegate implementations. Add branch-merge.enabled @immutable table option to CoreOptions. When enabled, the table is restricted to pure-append commits only, guaranteeing that branch merge is always safe. Add BranchMergeHandler interface and extend BranchManager with mergeBranch() and merge validation utilities.
Implement FileSystemBranchManager.mergeBranch() with: - Symmetric branch-merge.enabled validation on both branches - Append-only table validation (no primary keys) - Row-tracking consistency check between branches - Schema compatibility validation (fields and schema history) - Active-file diff to find branch-only files - DefaultBranchMergeHandler for committing merged files Enforce pure-append invariant when branch-merge.enabled=true: - SchemaValidation rejects primary keys, deletion vectors, and data evolution at table creation time - FileStoreCommitImpl rejects compaction and INSERT OVERWRITE at commit time Add Catalog and REST API support (CatalogBranchManager, RESTCatalog, POST /branches/merge endpoint). Add comprehensive tests covering positive merge, row-tracking, snapshot expiration, schema conflicts, and negative cases. Add branch merge documentation and REST OpenAPI spec.
616f113 to
1a55a59
Compare
JingsongLi
left a comment
There was a problem hiding this comment.
Thanks @JunRuiLee , left comments.
|
|
||
| @Immutable | ||
| public static final ConfigOption<Boolean> BRANCH_MERGE_ENABLED = | ||
| key("branch-merge.enabled") |
There was a problem hiding this comment.
It looks like we don't need this option, just throw exception in branch merging is oK.
There was a problem hiding this comment.
Thanks for the review. I added this option mainly to keep a clear invariant for branch merge: the table history must be pure-append.
My concern is that checking this only when mergeBranch is called may not be reliable, because old snapshots can expire. In that case, we may no longer be able to tell whether compaction or INSERT OVERWRITE happened before, and file-level merge could become unsafe.
So the option is intended to make this an explicit opt-in behavior and keep the table merge-safe from the beginning by rejecting compaction / INSERT OVERWRITE.
Another possible approach could be to persist some state indicating whether unsafe operations have ever happened, but that seems a bit heavier to me.
Please let me know what you think.
There was a problem hiding this comment.
I think enabling it directly is OK, after all, this is a brand new operation that users will not be involved in by default.
| } | ||
|
|
||
| @Override | ||
| public PojoDataFileMeta clearFirstRowId() { |
There was a problem hiding this comment.
newFirstRowId(@Nullable Long newFirstRowId)
| * @throws ForbiddenException Exception thrown on HTTP 403 means don't have the permission for | ||
| * this table | ||
| */ | ||
| public void mergeBranch(Identifier identifier, String sourceBranch, String targetBranch) { |
There was a problem hiding this comment.
Why do we need it as a REST API? Does it feel like just creating a new commit?
There was a problem hiding this comment.
Thanks for the question. Yes, it creates a new commit on the target branch.
I added it for REST catalog because branch merge needs to read and validate metadata from both branches before committing. It is also aligned with fastForward, which is already a REST branch operation.
There was a problem hiding this comment.
Unlike fastforward, merge has a higher overhead and involves many file operations, which is not suitable for being placed on a REST server to complete
Purpose
This PR is the first part of #7863, focusing on the core APIs, implementation, and REST support for branch merge.
It adds
mergeBranch, which incrementally adds data files that exist only in the source branch to the target branch, without replacing either branch. Correctness is guarded by the immutable table option'branch-merge.enabled' = 'true', which enforces the pure-append invariant required by file-level branch merge: compaction andINSERT OVERWRITEare rejected, and deletion vectors are not supported.Tests
Added core tests:
AppendOnlySimpleTableTest#testMergeBranchAppendOnlySimpleTableTest#testMergeBranchMultipleTimesAppendOnlySimpleTableTest#testMergeBranchFailsOnStaleDuplicateCommitAppendOnlySimpleTableTest#testMergeBranchBidirectionalAppendOnlySimpleTableTest#testMergeBranchEmptyDiffAppendOnlySimpleTableTest#testMergeBranchSchemaConflictAppendOnlySimpleTableTest#testMergeBranchSchemaHistoryConflictAppendOnlySimpleTableTest#testMergeBranchRowTrackingTableAppendOnlySimpleTableTest#testMergeBranchRowTrackingMultipleTimesAppendOnlySimpleTableTest#testMergeBranchRowTrackingAfterTargetWritesAppendOnlySimpleTableTest#testMergeBranchRowTrackingBetweenNonMainBranchesAppendOnlySimpleTableTest#testMergeBranchRowTrackingMismatchAppendOnlySimpleTableTest#testMergeBranchRowTrackingStaleMergeAppendOnlySimpleTableTest#testMergeBranchSameBranchAppendOnlySimpleTableTest#testMergeBranchSamePartitionAppendOnlySimpleTableTest#testMergeBranchNonExistentBranchAppendOnlySimpleTableTest#testMergeBranchMultiBucketAppendOnlySimpleTableTest#testMergeBranchNonExistentTargetBranchAppendOnlySimpleTableTest#testMergeBranchBetweenNonMainBranchesAppendOnlySimpleTableTest#testMergeBranchWithoutBranchMergeEnabledAppendOnlySimpleTableTest#testMergeBranchAfterSnapshotExpirationAppendOnlySimpleTableTest#testBranchMergeEnabledRejectsOverwritePrimaryKeySimpleTableTest#testMergeBranchPrimaryKeyTableAdded REST tests:
RESTCatalogTest#testMergeBranchRESTCatalogTest#testBranches