fix: filter out datasets with inconsistent database and LakeFS records#5171
fix: filter out datasets with inconsistent database and LakeFS records#5171xuang7 wants to merge 5 commits into
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #5171 +/- ##
============================================
- Coverage 47.15% 45.80% -1.35%
+ Complexity 2348 2342 -6
============================================
Files 1042 1046 +4
Lines 39989 40029 +40
Branches 4260 4259 -1
============================================
- Hits 18855 18335 -520
- Misses 20012 20580 +568
+ Partials 1122 1114 -8
*This pull request uses carry forward flags. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
mengw15
left a comment
There was a problem hiding this comment.
LGTM! Thanks for the fix! Before merge, could you run last manual test, thanks!
One behavior worth flagging: an owner with explicit access to an orphan dataset may still see it in their list with size=0. The accessible-datasets path in listDatasets doesn't call LakeFS, so the try-catch can't fire there — maybe only the publicDatasets path drops orphans. Probably actually fine — owner sees a "broken dataset" instead of "my dataset silently disappeared", which is arguably more informative.
Sounds good! In this version, the owner can still see the broken dataset in the list. I think it may be okay to keep this behavior for now, since keeping it visible can serve as a reminder that something is inconsistent. |
What changes were proposed in this PR?
This PR fixes an issue where dataset listings fail when dataset records in the database and LakeFS repositories are inconsistent. This breaks the workflow dataset picker and can also affect Hub dataset listings. The fix wraps the per-row retrieveRepositorySize call in a try/catch for ApiException, logs the orphan, and drops it from the response.
Demo:
Any related issues, documentation, discussions?
Closes #5106
How was this PR tested?
Added two tests.
Was this PR authored or co-authored using generative AI tooling?
Generated-by: Claude Opus 4.7