fix: tables schema command replace manual _delta_log JSON parsing with DeltaTable.schema() from the deltalake library#229
Open
pkontek wants to merge 33 commits into
Open
Conversation
Contributor
There was a problem hiding this comment.
Pull request overview
Fixes fab tables schema failing on Delta tables whose pre-checkpoint JSON logs were vacuumed, by switching schema extraction to the Delta protocol via the deltalake library.
Changes:
- Replaced manual
_delta_log/*.jsonscanning withdeltalake.DeltaTable(...).schema()for checkpoint-aware schema extraction. - Added
deltalake>=0.18.0to project dependencies. - Renamed typo’d error constant
ERROR_INVALID_DETLA_TABLEtoERROR_INVALID_DELTA_TABLEand wired it into the tables schema command.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| src/fabric_cli/core/fab_constant.py | Fixes typo in error constant name; minor formatting cleanup. |
| src/fabric_cli/commands/tables/fab_tables_schema.py | Refactors schema retrieval to use deltalake over ABFSS + token, removing fragile log parsing. |
| pyproject.toml | Adds deltalake dependency required for robust schema extraction. |
| .changes/unreleased/fixed-20260430-130558.yaml | Adds release note entry for the schema extraction fix. |
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Collaborator
|
@pkontek thank you for addressing the comments. Before reviewing your recent changes, please make sure all Copilot's comments are addressed too. |
The exc str could expose abfss:// URIs (workspace/item GUIDs), Rust delta-rs diagnostics, and storage-backend details in user-facing output. Strip it from the message while preserving the cause via `from exc`.
Move token acquisition, abfss:// URI building, storage_options assembly, and DeltaError → FabricCLIError mapping out of the command layer into a new centralised client (fab_delta_client.py), consistent with the fab_api_*.py pattern. The command now only resolves local_path and delegates to delta_client.get_table_schema(). fab_delta_client provides a single seam for future retry, timeout, and telemetry hooks.
…chema command Populate args.table_local_path in add_table_props_to_args so commands receive the already-resolved path from the OneLake context. Removes the manual f"Tables/..." reconstruction from fab_tables_schema.
…to_args When OneLake path resolution falls back to the .Shortcut form, the suffix would propagate into the abfss:// URI and cause DeltaTable to fail. Strip it upstream in add_table_props_to_args via remove_dot_suffix, consistent with how other OneLake commands handle it. Also fix unit test patch paths and args after fab_delta_client extraction.
…mand The allowlist lives in fab_delta_client next to the URI construction that makes the Tables/ assumption. Validation is skipped when item_type is absent so unit tests that construct args directly are unaffected.
Add test_complex_schema_field_contract asserting the exact JSON shape returned by DeltaTable.schema().to_json() for long, decimal, timestamp, map, and nested struct fields. Documents the non-obvious mappings (int64→"long", decimal128→"decimal(10,2)", timestamp→"timestamp_ntz") so format regressions are caught before they silently break scripts that pipe --output_format json.
Prove that schema extraction works when _delta_log contains only a checkpoint parquet file and no JSON commit logs (compacted-log scenario). The fixture writes a real Delta table via pyarrow, calls create_checkpoint(), then removes all .json logs. The test patches only FabAuth and the DeltaTable constructor (to redirect the abfss:// URI to the local fixture); the actual schema().to_json() call runs against a real delta-rs reader.
Pure unit tests → tests/test_core/test_delta_client.py Utils tests → tests/test_utils/test_fab_cmd_table_utils.py Integration → tests/test_commands/test_tables_schema.py (only)
The cassette only exercised workspace/lakehouse CRUD that already has coverage elsewhere; DeltaTable and FabAuth were both mocked out.
Comment on lines
12
to
+15
| requests | ||
| cryptography | ||
| fabric-cicd>=0.3.1 | ||
| deltalake>=0.18.0 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
📥 Pull Request
✨ Description of new changes
Summary: The
fab tables schemacommand failed with[InvalidDeltaTable] Failed to extract the table schemafor Delta tables where pre-checkpoint JSON commit log files had been cleaned up. The previous implementation manually scanned_delta_log/*.jsonfiles via the OneLake API looking for ametaDataentry — an approach that breaks once Delta Lake compacts logs into.checkpoint.parquetfiles and removes the preceding JSON entries.Context: Delta Lake creates a checkpoint every 10 transactions by default and retains log files according to the configured retention policy. Once older JSON commit files are removed, the
metaDataentry (which carries the schema) is no longer available in the remaining JSON logs, causing the command to fail even for healthy, accessible tables.Dependencies: Adds
deltalake>=0.18.0as a new dependency. Thedeltalakelibrary correctly resolves the table schema from both JSON logs and Parquet checkpoints using the standard Delta protocol, making the implementation robust and protocol-compliant.Changes:
_delta_logJSON parsing withDeltaTable.schema()from thedeltalakelibrarydeltalake>=0.18.0to project dependencies inpyproject.tomlfab_tables_schema.pyby removing ~50 lines of fragile log-parsing logic