Skip to content

[core] introduce Placeholder for Blob File Format#7889

Open
steFaiz wants to merge 5 commits into
apache:masterfrom
steFaiz:placeholder_blob
Open

[core] introduce Placeholder for Blob File Format#7889
steFaiz wants to merge 5 commits into
apache:masterfrom
steFaiz:placeholder_blob

Conversation

@steFaiz
Copy link
Copy Markdown
Contributor

@steFaiz steFaiz commented May 18, 2026

Purpose

This is the first part of #7881
Including:

  1. Bump Blob File Format to V2, introducing a PlaceHolder Blob.
  2. Introduce a fallbackReader for blob to skip placeholders. This is a two-level abstraction:
    a. At first, all data files will be divided according to max_seq_num
    b. within each group, create a sequential reader to logically concat files and fill missing gaps. For example: If the full row range of normal files is [0, 100], but some group only have one file with range [20, 80], the output is: [0, 19] -> filled with placeholders; [20, 80] -> records from files; [81, 100] -> filled with placeholders.
    c. create readers for each group, and read the blob from the max group whose value is NOT a placeholder.

The mechanism can be illustrated as below:
image

Tests

ITCase and Unit tests

@steFaiz steFaiz marked this pull request as draft May 18, 2026 11:17
@steFaiz steFaiz changed the title [core] introduce Placeholder for Blob File Format [wip][core] introduce Placeholder for Blob File Format May 18, 2026
@steFaiz steFaiz marked this pull request as ready for review May 19, 2026 06:19
@steFaiz steFaiz changed the title [wip][core] introduce Placeholder for Blob File Format [core] introduce Placeholder for Blob File Format May 19, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant