Skip to content

fix: tables schema command replace manual _delta_log JSON parsing with DeltaTable.schema() from the deltalake library#229

Open
pkontek wants to merge 33 commits into
microsoft:mainfrom
SoletPL:tables_schema_fix
Open

fix: tables schema command replace manual _delta_log JSON parsing with DeltaTable.schema() from the deltalake library#229
pkontek wants to merge 33 commits into
microsoft:mainfrom
SoletPL:tables_schema_fix

Conversation

@pkontek

@pkontek pkontek commented Apr 30, 2026

Copy link
Copy Markdown

📥 Pull Request

✨ Description of new changes

Summary: The fab tables schema command failed with [InvalidDeltaTable] Failed to extract the table schema for Delta tables where pre-checkpoint JSON commit log files had been cleaned up. The previous implementation manually scanned _delta_log/*.json files via the OneLake API looking for a metaData entry — an approach that breaks once Delta Lake compacts logs into .checkpoint.parquet files and removes the preceding JSON entries.

Context: Delta Lake creates a checkpoint every 10 transactions by default and retains log files according to the configured retention policy. Once older JSON commit files are removed, the metaData entry (which carries the schema) is no longer available in the remaining JSON logs, causing the command to fail even for healthy, accessible tables.

Dependencies: Adds deltalake>=0.18.0 as a new dependency. The deltalake library correctly resolves the table schema from both JSON logs and Parquet checkpoints using the standard Delta protocol, making the implementation robust and protocol-compliant.

Changes:

  • Replace manual _delta_log JSON parsing with DeltaTable.schema() from the deltalake library
  • Add deltalake>=0.18.0 to project dependencies in pyproject.toml
  • Simplify fab_tables_schema.py by removing ~50 lines of fragile log-parsing logic
  • Correct typo in error constant for invalid Delta table

Copilot AI review requested due to automatic review settings April 30, 2026 13:17
@pkontek pkontek requested a review from a team as a code owner April 30, 2026 13:17

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes fab tables schema failing on Delta tables whose pre-checkpoint JSON logs were vacuumed, by switching schema extraction to the Delta protocol via the deltalake library.

Changes:

  • Replaced manual _delta_log/*.json scanning with deltalake.DeltaTable(...).schema() for checkpoint-aware schema extraction.
  • Added deltalake>=0.18.0 to project dependencies.
  • Renamed typo’d error constant ERROR_INVALID_DETLA_TABLE to ERROR_INVALID_DELTA_TABLE and wired it into the tables schema command.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File Description
src/fabric_cli/core/fab_constant.py Fixes typo in error constant name; minor formatting cleanup.
src/fabric_cli/commands/tables/fab_tables_schema.py Refactors schema retrieval to use deltalake over ABFSS + token, removing fragile log parsing.
pyproject.toml Adds deltalake dependency required for robust schema extraction.
.changes/unreleased/fixed-20260430-130558.yaml Adds release note entry for the schema extraction fix.

Comment thread src/fabric_cli/commands/tables/fab_tables_schema.py Outdated
Comment thread src/fabric_cli/commands/tables/fab_tables_schema.py Outdated
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings April 30, 2026 13:26

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

@pkontek pkontek changed the title Tables schema fix fix: tables schema command replace manual _delta_log JSON parsing with DeltaTable.schema() from the deltalake library Apr 30, 2026
@pkontek pkontek requested a review from Copilot May 4, 2026 19:19

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

Comment thread src/fabric_cli/commands/tables/fab_tables_schema.py Outdated
Comment thread src/fabric_cli/commands/tables/fab_tables_schema.py Outdated
Copilot AI review requested due to automatic review settings May 7, 2026 08:24

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

Comment thread src/fabric_cli/commands/tables/fab_tables_schema.py Outdated
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings May 7, 2026 08:31

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

Comment thread pyproject.toml
Comment thread src/fabric_cli/commands/tables/fab_tables_schema.py
Copilot AI review requested due to automatic review settings May 7, 2026 09:12

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated no new comments.

Copilot AI review requested due to automatic review settings June 17, 2026 18:52

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated 4 comments.

Comment thread src/fabric_cli/core/fab_constant.py
Comment thread pyproject.toml Outdated
Comment thread src/fabric_cli/commands/tables/fab_tables_schema.py Outdated
Comment thread src/fabric_cli/commands/tables/fab_tables_schema.py Outdated
@ayeshurun

Copy link
Copy Markdown
Collaborator

@pkontek thank you for addressing the comments. Before reviewing your recent changes, please make sure all Copilot's comments are addressed too.

pkontek added 2 commits June 18, 2026 09:28
The exc str could expose abfss:// URIs (workspace/item GUIDs), Rust
delta-rs diagnostics, and storage-backend details in user-facing output.
Strip it from the message while preserving the cause via `from exc`.
Move token acquisition, abfss:// URI building, storage_options assembly,
and DeltaError → FabricCLIError mapping out of the command layer into a
new centralised client (fab_delta_client.py), consistent with the
fab_api_*.py pattern. The command now only resolves local_path and
delegates to delta_client.get_table_schema(). fab_delta_client provides
a single seam for future retry, timeout, and telemetry hooks.
Copilot AI review requested due to automatic review settings June 18, 2026 09:37

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 10 out of 10 changed files in this pull request and generated 5 comments.

Comment thread tests/test_commands/test_tables_schema.py Outdated
Comment thread tests/test_commands/test_tables_schema.py Outdated
Comment thread tests/test_commands/test_tables_schema.py Outdated
Comment thread src/fabric_cli/client/fab_delta_client.py
Comment thread src/fabric_cli/client/fab_delta_client.py
pkontek added 2 commits June 18, 2026 09:44
…chema command

Populate args.table_local_path in add_table_props_to_args so commands
receive the already-resolved path from the OneLake context. Removes the
manual f"Tables/..." reconstruction from fab_tables_schema.
…to_args

When OneLake path resolution falls back to the .Shortcut form, the
suffix would propagate into the abfss:// URI and cause DeltaTable
to fail. Strip it upstream in add_table_props_to_args via
remove_dot_suffix, consistent with how other OneLake commands handle it.

Also fix unit test patch paths and args after fab_delta_client extraction.
Copilot AI review requested due to automatic review settings June 18, 2026 09:59

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 11 out of 11 changed files in this pull request and generated 1 comment.

Comment thread src/fabric_cli/client/fab_delta_client.py
pkontek added 7 commits June 18, 2026 10:33
…mand

The allowlist lives in fab_delta_client next to the URI construction that makes the Tables/ assumption. Validation is skipped when item_type is absent so unit tests that construct args directly are unaffected.
Add test_complex_schema_field_contract asserting the exact JSON shape returned by DeltaTable.schema().to_json() for long, decimal, timestamp, map, and nested struct fields. Documents the non-obvious mappings (int64→"long", decimal128→"decimal(10,2)", timestamp→"timestamp_ntz") so format regressions are caught before they silently break scripts that pipe --output_format json.
Prove that schema extraction works when _delta_log contains only a checkpoint parquet file and no JSON commit logs (compacted-log scenario).
The fixture writes a real Delta table via pyarrow, calls create_checkpoint(), then removes all .json logs. The test patches only FabAuth and the DeltaTable constructor (to redirect the abfss:// URI to the local fixture); the actual schema().to_json() call runs against a real delta-rs reader.
Pure unit tests → tests/test_core/test_delta_client.py
Utils tests     → tests/test_utils/test_fab_cmd_table_utils.py
Integration     → tests/test_commands/test_tables_schema.py (only)
The cassette only exercised workspace/lakehouse CRUD that already has
coverage elsewhere; DeltaTable and FabAuth were both mocked out.
Copilot AI review requested due to automatic review settings June 18, 2026 13:22

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 13 out of 13 changed files in this pull request and generated 2 comments.

Comment thread src/fabric_cli/utils/fab_cmd_table_utils.py
Comment thread tests/test_core/test_delta_client.py Outdated
Copilot AI review requested due to automatic review settings June 18, 2026 13:38

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 13 out of 13 changed files in this pull request and generated 1 comment.

Comment thread requirements-dev.txt
Comment on lines 12 to +15
requests
cryptography
fabric-cicd>=0.3.1
deltalake>=0.18.0
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] tables schema command fails to extract schema from Delta tables with checkpoint files

4 participants