Skip to content

[core][rest] Add branch merge support for append-only tables#7882

Open
JunRuiLee wants to merge 3 commits into
apache:masterfrom
JunRuiLee:branch_merge_final2_part1
Open

[core][rest] Add branch merge support for append-only tables#7882
JunRuiLee wants to merge 3 commits into
apache:masterfrom
JunRuiLee:branch_merge_final2_part1

Conversation

@JunRuiLee
Copy link
Copy Markdown
Contributor

Purpose

This PR is the first part of #7863, focusing on the core APIs, implementation, and REST support for branch merge.

It adds mergeBranch, which incrementally adds data files that exist only in the source branch to the target branch, without replacing either branch. Correctness is guarded by the immutable table option 'branch-merge.enabled' = 'true', which enforces the pure-append invariant required by file-level branch merge: compaction and INSERT OVERWRITE are rejected, and deletion vectors are not supported.

Tests

Added core tests:

  • AppendOnlySimpleTableTest#testMergeBranch
  • AppendOnlySimpleTableTest#testMergeBranchMultipleTimes
  • AppendOnlySimpleTableTest#testMergeBranchFailsOnStaleDuplicateCommit
  • AppendOnlySimpleTableTest#testMergeBranchBidirectional
  • AppendOnlySimpleTableTest#testMergeBranchEmptyDiff
  • AppendOnlySimpleTableTest#testMergeBranchSchemaConflict
  • AppendOnlySimpleTableTest#testMergeBranchSchemaHistoryConflict
  • AppendOnlySimpleTableTest#testMergeBranchRowTrackingTable
  • AppendOnlySimpleTableTest#testMergeBranchRowTrackingMultipleTimes
  • AppendOnlySimpleTableTest#testMergeBranchRowTrackingAfterTargetWrites
  • AppendOnlySimpleTableTest#testMergeBranchRowTrackingBetweenNonMainBranches
  • AppendOnlySimpleTableTest#testMergeBranchRowTrackingMismatch
  • AppendOnlySimpleTableTest#testMergeBranchRowTrackingStaleMerge
  • AppendOnlySimpleTableTest#testMergeBranchSameBranch
  • AppendOnlySimpleTableTest#testMergeBranchSamePartition
  • AppendOnlySimpleTableTest#testMergeBranchNonExistentBranch
  • AppendOnlySimpleTableTest#testMergeBranchMultiBucket
  • AppendOnlySimpleTableTest#testMergeBranchNonExistentTargetBranch
  • AppendOnlySimpleTableTest#testMergeBranchBetweenNonMainBranches
  • AppendOnlySimpleTableTest#testMergeBranchWithoutBranchMergeEnabled
  • AppendOnlySimpleTableTest#testMergeBranchAfterSnapshotExpiration
  • AppendOnlySimpleTableTest#testBranchMergeEnabledRejectsOverwrite
  • PrimaryKeySimpleTableTest#testMergeBranchPrimaryKeyTable

Added REST tests:

  • RESTCatalogTest#testMergeBranch
  • branch-not-exist coverage in RESTCatalogTest#testBranches

JunRuiLee added 2 commits May 17, 2026 12:03
Add mergeBranch(sourceBranch, targetBranch) to Table and Catalog
interfaces, with default/delegate implementations.

Add branch-merge.enabled @immutable table option to CoreOptions.
When enabled, the table is restricted to pure-append commits only,
guaranteeing that branch merge is always safe.

Add BranchMergeHandler interface and extend BranchManager with
mergeBranch() and merge validation utilities.
Implement FileSystemBranchManager.mergeBranch() with:
- Symmetric branch-merge.enabled validation on both branches
- Append-only table validation (no primary keys)
- Row-tracking consistency check between branches
- Schema compatibility validation (fields and schema history)
- Active-file diff to find branch-only files
- DefaultBranchMergeHandler for committing merged files

Enforce pure-append invariant when branch-merge.enabled=true:
- SchemaValidation rejects primary keys, deletion vectors, and
  data evolution at table creation time
- FileStoreCommitImpl rejects compaction and INSERT OVERWRITE
  at commit time

Add Catalog and REST API support (CatalogBranchManager, RESTCatalog,
POST /branches/merge endpoint).

Add comprehensive tests covering positive merge, row-tracking,
snapshot expiration, schema conflicts, and negative cases.

Add branch merge documentation and REST OpenAPI spec.
@JunRuiLee JunRuiLee force-pushed the branch_merge_final2_part1 branch from 616f113 to 1a55a59 Compare May 18, 2026 04:04
Copy link
Copy Markdown
Contributor

@JingsongLi JingsongLi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @JunRuiLee , left comments.


@Immutable
public static final ConfigOption<Boolean> BRANCH_MERGE_ENABLED =
key("branch-merge.enabled")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like we don't need this option, just throw exception in branch merging is oK.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review. I added this option mainly to keep a clear invariant for branch merge: the table history must be pure-append.

My concern is that checking this only when mergeBranch is called may not be reliable, because old snapshots can expire. In that case, we may no longer be able to tell whether compaction or INSERT OVERWRITE happened before, and file-level merge could become unsafe.

So the option is intended to make this an explicit opt-in behavior and keep the table merge-safe from the beginning by rejecting compaction / INSERT OVERWRITE.

Another possible approach could be to persist some state indicating whether unsafe operations have ever happened, but that seems a bit heavier to me.

Please let me know what you think.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think enabling it directly is OK, after all, this is a brand new operation that users will not be involved in by default.

}

@Override
public PojoDataFileMeta clearFirstRowId() {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

newFirstRowId(@Nullable Long newFirstRowId)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

* @throws ForbiddenException Exception thrown on HTTP 403 means don't have the permission for
* this table
*/
public void mergeBranch(Identifier identifier, String sourceBranch, String targetBranch) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need it as a REST API? Does it feel like just creating a new commit?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the question. Yes, it creates a new commit on the target branch.

I added it for REST catalog because branch merge needs to read and validate metadata from both branches before committing. It is also aligned with fastForward, which is already a REST branch operation.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unlike fastforward, merge has a higher overhead and involves many file operations, which is not suitable for being placed on a REST server to complete

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants