fix: filter out datasets with inconsistent database and LakeFS records by xuang7 · Pull Request #5171 · apache/texera

xuang7 · 2026-05-24T00:41:19Z

What changes were proposed in this PR?

This PR fixes an issue where dataset listings fail when dataset records in the database and LakeFS repositories are inconsistent. This breaks the workflow dataset picker and can also affect Hub dataset listings. The fix wraps the per-row retrieveRepositorySize call in a try/catch for ApiException, logs the orphan, and drops it from the response.

Demo:

Before	After

Any related issues, documentation, discussions?

Closes #5106

How was this PR tested?

Added two tests.

Was this PR authored or co-authored using generative AI tooling?

Generated-by: Claude Opus 4.7

codecov-commenter · 2026-05-24T00:43:23Z

Codecov Report

❌ Patch coverage is 0% with 3 lines in your changes missing coverage. Please review.
✅ Project coverage is 45.80%. Comparing base (c435aa7) to head (de46600).

Files with missing lines	Patch %	Lines
...exera/web/resource/dashboard/hub/HubResource.scala	0.00%	2 Missing ⚠️
...ache/texera/service/resource/DatasetResource.scala	0.00%	0 Missing and 1 partial ⚠️

Additional details and impacted files

@@             Coverage Diff              @@
##               main    #5171      +/-   ##
============================================
- Coverage     47.15%   45.80%   -1.35%     
+ Complexity     2348     2342       -6     
============================================
  Files          1042     1046       +4     
  Lines         39989    40029      +40     
  Branches       4260     4259       -1     
============================================
- Hits          18855    18335     -520     
- Misses        20012    20580     +568     
+ Partials       1122     1114       -8

Flag	Coverage Δ		*Carryforward flag
access-control-service	`39.53% <ø> (ø)`
agent-service	`33.74% <ø> (-0.03%)`	⬇️	Carriedforward from c4a945d
amber	`50.29% <0.00%> (-0.07%)`	⬇️
computing-unit-managing-service	`0.00% <ø> (ø)`
config-service	`0.00% <ø> (ø)`
file-service	`32.45% <0.00%> (+0.26%)`	⬆️
frontend	`34.62% <ø> (-3.20%)`	⬇️	Carriedforward from c4a945d
python	`90.50% <ø> (ø)`		Carriedforward from c4a945d
workflow-compiling-service	`56.81% <ø> (ø)`

*This pull request uses carry forward flags. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

mengw15

Left one comment

mengw15

LGTM! Thanks for the fix! Before merge, could you run last manual test, thanks!
One behavior worth flagging: an owner with explicit access to an orphan dataset may still see it in their list with size=0. The accessible-datasets path in listDatasets doesn't call LakeFS, so the try-catch can't fire there — maybe only the publicDatasets path drops orphans. Probably actually fine — owner sees a "broken dataset" instead of "my dataset silently disappeared", which is arguably more informative.

xuang7 · 2026-05-25T21:36:55Z

LGTM! Thanks for the fix! Before merge, could you run last manual test, thanks! One behavior worth flagging: an owner with explicit access to an orphan dataset may still see it in their list with size=0. The accessible-datasets path in listDatasets doesn't call LakeFS, so the try-catch can't fire there — maybe only the publicDatasets path drops orphans. Probably actually fine — owner sees a "broken dataset" instead of "my dataset silently disappeared", which is arguably more informative.

Sounds good! In this version, the owner can still see the broken dataset in the list. I think it may be okay to keep this behavior for now, since keeping it visible can serve as a reminder that something is inconsistent.

xuang7 added 2 commits May 23, 2026 17:22

update.

d352629

Merge branch 'main' into fix/filter-mismatched-datasets

a98106b

github-actions Bot assigned xuang7 May 24, 2026

github-actions Bot added engine fix common platform Non-amber Scala service paths labels May 24, 2026

xuang7 requested a review from aicam May 24, 2026 00:44

chenlica requested a review from mengw15 May 25, 2026 07:08

mengw15 reviewed May 25, 2026

View reviewed changes

Comment thread file-service/src/main/scala/org/apache/texera/service/resource/DatasetResource.scala Outdated

mengw15 and others added 2 commits May 25, 2026 01:03

Merge branch 'main' into fix/filter-mismatched-datasets

c4a945d

update.

e4a4f50

github-actions Bot removed the common label May 25, 2026

update..

de46600

xuang7 requested a review from mengw15 May 25, 2026 20:52

mengw15 approved these changes May 25, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: filter out datasets with inconsistent database and LakeFS records#5171

fix: filter out datasets with inconsistent database and LakeFS records#5171
xuang7 wants to merge 5 commits into
apache:mainfrom
xuang7:fix/filter-mismatched-datasets

xuang7 commented May 24, 2026 •

edited

Loading

Uh oh!

codecov-commenter commented May 24, 2026 •

edited

Loading

Uh oh!

mengw15 left a comment

Uh oh!

Uh oh!

mengw15 left a comment

Uh oh!

xuang7 commented May 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

xuang7 commented May 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this PR?

Any related issues, documentation, discussions?

How was this PR tested?

Was this PR authored or co-authored using generative AI tooling?

Uh oh!

codecov-commenter commented May 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

mengw15 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mengw15 left a comment

Choose a reason for hiding this comment

Uh oh!

xuang7 commented May 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

xuang7 commented May 24, 2026 •

edited

Loading

codecov-commenter commented May 24, 2026 •

edited

Loading