Skip to content

[python] Add system tables#7908

Open
TheR1sing3un wants to merge 20 commits into
apache:masterfrom
TheR1sing3un:py-pypaimon-system-tables-phase-1
Open

[python] Add system tables#7908
TheR1sing3un wants to merge 20 commits into
apache:masterfrom
TheR1sing3un:py-pypaimon-system-tables-phase-1

Conversation

@TheR1sing3un
Copy link
Copy Markdown
Member

@TheR1sing3un TheR1sing3un commented May 19, 2026

Summary

Adds native PyPaimon access to eight core system tables, matching the Java implementation column for column.

  • New `pypaimon.table.system` package: `SystemTable` base + `SystemTableLoader` registry + in-memory read pipeline (`SystemReadBuilder` / `SystemTableScan` / `SystemTableRead`).
  • `FilesystemCatalog.get_table` and `RESTCatalog.get_table` route `$`-suffixed identifiers to the loader; non-system requests are unchanged.
  • Tables implemented: `snapshots`, `schemas`, `options`, `manifests`, `files`, `partitions`, `tags`, `branches`. Schema, nullability and primary keys match Java's `TABLE_TYPE`.
  • Manager helpers added where needed: `SnapshotManager.list_snapshots`, `SchemaManager.list_all`, `BranchManager.branch_create_time`.
  • Predicate pushdown is not implemented yet: `with_filter` raises `NotImplementedError` rather than dropping the filter silently. A few columns are emitted as NULL/0 placeholders (documented).
  • User docs: `docs/content/pypaimon/system-tables.md`.

Test plan

  • `pytest pypaimon/tests/system/` (73 tests, all green)
  • Regression: `pytest pypaimon/tests/filesystem_catalog_test.py pypaimon/tests/filesystem_catalog_branch_test.py pypaimon/tests/filesystem_catalog_tag_test.py pypaimon/tests/snapshot_manager_test.py pypaimon/tests/schema_manager_test.py pypaimon/tests/branch_manager_test.py pypaimon/tests/branch/ pypaimon/tests/rest/rest_catalog_test.py` (102 tests, all green)
  • `bash dev/lint-python.sh -i license,flake8`

Adds an abstract SystemTable subclass of Table that exposes a base
data table's metadata as a read-only table. Write and search builders
raise NotImplementedError; subclasses implement system_table_name(),
row_type(), and _build_arrow_table() to materialise their contents.

The Identifier encodes the system table suffix (and branch segment
when present) so downstream callers see a stable on-wire shape.
Introduces SYSTEM_TABLE_LOADERS, a name-to-factory dictionary that
mirrors the subset of Java's SystemTableLoader supported by the
Python SDK. Each factory is a lazy import so a new system table only
requires its own module plus a registry entry.

The eight registered names are snapshots, schemas, options, manifests,
files, partitions, tags, and branches; deferred Java entries (audit_log,
binlog, read_optimized, consumers, statistics, aggregation_fields,
buckets, file_key_ranges, table_indexes, row_tracking, all_tables,
all_partitions, all_table_options, catalog_options) are listed in the
module docstring so their omission stays visible.
…temCatalog

FilesystemCatalog.get_table now detects system-table identifiers,
loads the underlying data table from the bare name, and asks
SystemTableLoader to wrap it. Unknown system names raise
TableNotExistException so callers see a consistent contract whether
the base table or the system view is missing.

The existing data-table flow moves into a private _load_data_table
helper to avoid recursion through the dispatching entry point.
…alog

Mirrors the dispatch change in FilesystemCatalog: RESTCatalog.get_table
now branches on identifier.is_system_table(). System-table requests
fetch the underlying data table by its bare name, hand it to
SystemTableLoader, and surface TableNotExistException when no
implementation matches.

The data-table flow stays unchanged behind a new _load_data_table
helper so existing callers (and the dispatch path) share the same
metadata-loading code.
Wires SystemTable.new_read_builder to a SystemReadBuilder whose
new_scan / new_read pair materialises the entire table as a single
PyArrow-backed split, then exposes to_arrow, to_pandas, to_iterator,
to_record_batch_iterator and to_duckdb so users reach for the same
API as on a regular data table.

Subclasses override _build_arrow_table(); everything else (projection,
limit, predicate-builder construction) is shared. with_filter is
preserved on the builder but the read raises NotImplementedError when
a predicate is set, so filters never get silently dropped.
SnapshotManager.list_snapshots enumerates every persisted snapshot in
ID order, skipping IDs whose file has been expired. SchemaManager.list_all
returns every committed schema in ID order. BranchManager grows a
default branch_create_time accessor that returns None; the filesystem
implementation overrides it to expose the branch directory's mtime in
epoch milliseconds, falling back to None when the underlying file
status can't supply one.
OptionsTable returns one row per (key, value) of the latest table
schema's options, matching the Java OptionsTable column layout
(both columns NOT NULL, "key" as the primary key).
BranchesTable lists every named branch with the branch directory's
modification time. When the underlying store cannot supply an mtime
(some remote object stores, REST-managed branches) the value falls
back to epoch 0 so the NOT NULL contract from Java's TABLE_TYPE
holds.
TagsTable surfaces every tag's snapshot id, schema id, commit time
and record count. create_time and time_retained are emitted as NULL
because pypaimon's Tag dataclass does not yet carry those fields —
the same compromise as FileSystemCatalog.get_tag. Schema (including
NOT NULL / NULL distinctions and the tag_name primary key) matches
Java's TagsTable column for column.
SchemasTable returns one row per committed schema version, with the
fields / partition_keys / primary_keys / options columns encoded as
compact JSON strings (matching the column layout and nullability of
Java's SchemasTable). update_time is the schema's own time_millis.
SnapshotsTable returns one row per persisted snapshot in ascending
ID order, matching the Java SnapshotsTable column layout. NOT NULL
columns (snapshot_id, schema_id, commit_user, ...) and NULLABLE
columns (changelog_manifest_list, watermark, next_row_id, ...) line
up with Java's TABLE_TYPE. snapshot_id is the primary key.
ManifestsTable lists every manifest referenced by the latest snapshot
(base + delta + changelog), matching Java's column layout: file_name,
file_size, num_added_files, num_deleted_files, schema_id, min/max
partition stats, and min/max row id. The two partition-stat columns
are emitted as NULL placeholders until pypaimon grows a shared
partition row-to-string helper; the column shape is preserved so the
schema contract stays bit-equal with Java.
PartitionsTable aggregates the latest snapshot's manifest entries by
partition spec, returning record_count, file_size_in_bytes, file_count,
last_update_time, and total_buckets. Catalog-owned columns
(created_at, created_by, updated_by, options, done) are filled with
placeholders for the filesystem path; REST-backed catalogs will
populate them via the catalog API in a later phase.
FilesTable emits one row per ADD entry surviving the latest snapshot's
manifests. Columns match Java's FilesTable 1:1 including the camelCase
"deleteRowCount" wire name and the trailing ARRAY<STRING> write_cols.
null_value_counts, min_value_stats and max_value_stats are serialised
as compact JSON dictionaries keyed by column name (using the file's
value_stats_cols when present), partition is rendered "k=v/k2=v2",
and min/max_key fall back to NULL for tables without a primary key.
Pins the AbstractCatalog-equivalent contract: list_tables('db') returns
the base table names with no '$'-suffixed entries, but
get_table('db.t\$<name>') still resolves every registered system
table. Catches future regressions where a listing implementation
might start exposing internal directories.
Adds docs/content/pypaimon/system-tables.md covering the eight tables
shipped in phase 1 (snapshots, schemas, options, manifests, files,
partitions, tags, branches): column layout (with nullability and
primary keys), the rendering conventions for partitions and
stats-JSON, and the known limitations relative to the Java runtime
(predicate pushdown unsupported, several placeholder columns).
…d docs

Replaces "phase 1" / "phase 2" / "later phase" wording with neutral
descriptions of current behaviour, and renames test fixtures /
identifiers that embedded the same internal vocabulary. No behaviour
change.
@TheR1sing3un TheR1sing3un changed the title [python] Add system tables (phase 1) [python] Add system tables May 19, 2026
The original snippet built two separate read builders, which silently
discards any with_projection / with_limit set on the first one. Reuse a
single builder for the scan and the read, call to_pandas(splits)
directly, and add a small example showing projection + limit chained on
the same builder.
Java's BranchesTable reads branch directory mtimes through FileIO and
the static BranchManager.branchPath helper; the BranchManager
interface itself has no branch_create_time method. Mirror that:
inline the mtime read in BranchesTable, drop the
BranchManager.branch_create_time API together with its
FileSystemBranchManager override and its dedicated tests.

The previous shape returned None from CatalogBranchManager (the REST
binding) which surfaced epoch 0 for every branch under a REST
catalog. With this change FS and REST share the same code path: the
table's configured FileIO returns the real mtime in both cases.
The class docstrings, column comments, doc page and test names
described every table as "mirroring" or "matching" the other runtime.
Rewrite them to describe what each table is on its own; rename
test_schema_matches_*_table to test_schema_column_layout. No behaviour
change.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant