Skip to content

Support glob patterns in open_datatree(group=...) for selective group loading#11302

Open
aladinor wants to merge 8 commits intopydata:mainfrom
aladinor:glob-group-filtering-standalone
Open

Support glob patterns in open_datatree(group=...) for selective group loading#11302
aladinor wants to merge 8 commits intopydata:mainfrom
aladinor:glob-group-filtering-standalone

Conversation

@aladinor
Copy link
Copy Markdown
Contributor

Summary

When the group parameter contains glob metacharacters (*, ?, [), filter which groups are opened instead of re-rooting the tree. This avoids loading the entire hierarchy when only a subset is needed.

Use cases

  • Radar data: xr.open_datatree("radar.nc", group="*/sweep_0") — load only the lowest elevation sweep from each volume scan
  • CMIP archives: xr.open_datatree("cmip.zarr", group="*/historical/tas") — load only temperature across all models

Changes

  • Added shared utilities _is_glob_pattern, _filter_group_paths, and _resolve_group_and_filter in common.py
  • Updated NetCDF4, H5NetCDF, and Zarr backends to use a discover → filter → open pipeline
  • Uses the same matching engine as DataTree.match() (PurePosixPath.match)
  • Root (/) and all ancestors of matched nodes are always included to form a valid tree

Behavior summary

group value Behavior
None Load all groups (unchanged)
"VCP-34" (no glob chars) Root selection (unchanged)
"*/sweep_0" (glob chars) Filter mode — only matched groups + ancestors
Pattern matches nothing Root-only tree

Test plan

  • 27 new tests covering all backends (netCDF4, h5netcdf, zarr v2/v3)
  • Unit tests for _is_glob_pattern, _filter_group_paths, _resolve_group_and_filter with *, ?, []
  • Integration tests: glob match, no-match, data preservation, open_groups API
  • Full test_backends_datatree.py suite passes (228 passed, 0 failures)
  • Pre-commit checks pass

@github-actions github-actions bot added topic-backends topic-zarr Related to zarr storage library io labels Apr 16, 2026
Add _is_glob_pattern, _filter_group_paths, and _resolve_group_and_filter
to common.py for detecting and applying glob patterns to group paths.
Use _resolve_group_and_filter in open_groups_as_dict to support glob
patterns in the group parameter for selective group loading.
Use _resolve_group_and_filter in open_groups_as_dict to support glob
patterns in the group parameter for selective group loading.
Use _resolve_group_and_filter in open_groups_as_dict to support glob
patterns in the group parameter for selective group loading.
Update docstrings for the group kwarg in open_datatree and open_groups
to describe glob metacharacter behavior.
Add integration tests for netCDF4, h5netcdf, and zarr backends, plus
unit tests for _is_glob_pattern, _filter_group_paths, and
_resolve_group_and_filter covering *, ?, and [] metacharacters.
@aladinor aladinor force-pushed the glob-group-filtering-standalone branch from e892524 to 5fb46e1 Compare April 16, 2026 17:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

io topic-backends topic-zarr Related to zarr storage library

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support glob patterns in open_datatree(group=...) for selective group loading

1 participant