HealDA IO Improvement: Parallel decode + Obstore by negin513 · Pull Request #914 · NVIDIA/earth2studio

negin513 · 2026-06-09T09:57:30Z

Earth2Studio Pull Request

Description

This PR adds the second HealDA data-path: parallelizing the NetCDF→DataFrame decode of the UFS GSI observation files. After #913 , the remaining cost is the per-file HDF5→pandas decode, which is CPU- and GIL-bound and runs single-threaded.

UFSObsConv / UFSObsSat read hundreds of diag_*.nc4 files per analysis; each is parsed (h5netcdf), char→string converted, channel-index expanded, and built into a DataFrame. This PR fans that per-file work across forked worker processes.

Checklist

I am familiar with the Contributing Guidelines.
New or existing tests cover these changes.
The documentation is up to date with these changes.
The CHANGELOG.md is up to date with these changes.
An issue is linked to this pull request.
Assess and address Greptile feedback (AI code review bot for guidance; use discretion, addressing all feedback is not required).

Dependencies

Swap UFSObsConv/UFSObsSat per-file fetch from s3fs._cat_file to obstore get_range_async/get_async on a cached anonymous per-bucket S3Store. Removes the s3fs filesystem/session plumbing; preserves cache paths, byte-range semantics, and missing-file (404) handling. Adds obstore dependency (s3fs kept for other data sources). Speeds up the HealDA obs fetch (the ~90% data-path cost): cold obs fetch 49.4s -> 38s, end-to-end HealDA pipeline -22% on GB200 (cross-cloud to NOAA S3), identical analysis output.

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

Decode each GSI file's HDF5->DataFrame across forked workers, on top of the obstore fetch. Worker count is chosen automatically: min(available_cpus, 16, n_files). Falls back to serial decode if a CUDA context already exists in the process (fork-after-CUDA is unsafe). No configuration knob; serial path is unchanged behavior when only one worker is selected. Decode ~5x faster; HealDA e2e drops further on top of the obstore fetch. Signed-off-by: Negin Sobhani <nsobhani@nvidia.com>

greptile-apps · 2026-06-09T10:00:35Z

Greptile Summary

This PR replaces s3fs with obstore for S3 I/O and fans out the per-file HDF5→DataFrame decode step across forked worker processes using ProcessPoolExecutor, targeting the CPU/GIL-bound bottleneck that remained after the async fetch improvements in #913.

obstore migration: _async_init and s3fs.S3FileSystem are replaced with lazily-created per-bucket S3Store instances; byte-range and full-object fetches are rewritten using obs.get_range_async / obs.get_async.
Parallel decode: _compile_dataframe now splits _GSIAsyncTask lists into chunks and dispatches them to a forked process pool via _decode_chunk_idx, with automatic worker-count selection capped at 16 and a CUDA-initialized safety fallback; the per-chunk work lives in the new _compile_chunk method.
Global context dict: a module-level _DECODE_CTX dict carries the unpicklable self/chunks/schema into forked workers via copy-on-write, which is unsafe under concurrent calls.

Confidence Score: 3/5

Safe on Linux with sequential use; risks incorrect results or hangs on macOS and under concurrent callers due to two separate defects in the parallel-decode path.

The macOS guard checks whether fork is in all available start methods — it is on macOS, so the guard never triggers and the code proceeds to fork with get_context('fork') despite the comment explicitly calling that unsafe. The module-level _DECODE_CTX dict is written then read by forked workers non-atomically; concurrent calls could overwrite each other's context before the fork occurs, silently producing results built from the wrong tasks.

earth2studio/data/ufs.py — specifically _resolve_decode_workers (macOS guard) and _compile_dataframe (global context dict).

Important Files Changed

Filename	Overview
earth2studio/data/ufs.py	Replaces s3fs with obstore for S3 I/O and adds fork-based parallel decode via ProcessPoolExecutor; has a macOS safety-guard bug and a thread-unsafe module-level context dict that could produce wrong results under concurrent use.
pyproject.toml	Adds obstore>=0.8 as a core dependency; s3fs is no longer imported in ufs.py but is not removed from dependencies — verify whether it's still needed elsewhere.

_{Reviews (1): Last reviewed commit: "data/ufs: parallelize NetCDF decode acro..." | Re-trigger Greptile}

greptile-apps · 2026-06-09T10:00:39Z

+        if workers > 1 and (
+            "fork" not in multiprocessing.get_all_start_methods()
+            or _cuda_initialized()
+        ):
+            # Parallel decode requires the 'fork' start method (not available on
+            # Windows; unsafe on macOS). Also unsafe once CUDA is initialized.
+            # In either case fall back to serial decode.
+            workers = 1


macOS safety guard does not cover macOS

The condition "fork" not in multiprocessing.get_all_start_methods() evaluates to False on macOS because fork is technically in the list of available methods — it just isn't the safe/default one. The code comment explicitly states "unsafe on macOS … fall back to serial decode," but the guard never triggers on macOS, and line 300 unconditionally creates the executor with get_context("fork"). On macOS this will proceed with fork-after-threads, risking deadlocks or corrupted state from Objective-C/Core Foundation globals inherited by workers. The fix is to also check sys.platform == "darwin" (or compare against multiprocessing.get_start_method(), which defaults to "spawn" on macOS) before allowing parallel decode.

greptile-apps · 2026-06-09T10:00:40Z

+        _DECODE_CTX["args"] = (self, chunks, variables, schema)
+        try:
+            with concurrent.futures.ProcessPoolExecutor(
+                max_workers=len(chunks),
+                mp_context=multiprocessing.get_context("fork"),
+            ) as executor:
+                parts = list(executor.map(_decode_chunk_idx, range(len(chunks))))
+        finally:
+            _DECODE_CTX.clear()


Thread-unsafe module-level global _DECODE_CTX

_DECODE_CTX["args"] is written on line 296 and the forked workers inherit it via copy-on-write. If two calls to _compile_dataframe are in-flight simultaneously — e.g., when _sync_async dispatches work across threads or when two separate instances are driven concurrently — Thread A's write can be overwritten by Thread B before A's workers are forked. Workers from Thread A would then process Thread B's tasks, silently producing wrong results or raising index/shape errors. Even with the GIL protecting individual dict operations, the three-step sequence (write global → fork workers → workers read global) is not atomic. Consider replacing the global with a multiprocessing.Manager shared value, threading locks around the critical section, or — simplest — passing the context as pickled init args to the pool initializer instead of relying on global inheritance.

greptile-apps · 2026-06-09T10:00:41Z

+        except (FileNotFoundError, obs.exceptions.NotFoundError):
+            self._handle_missing_file(path)
+        except Exception as err:
+            raise


The except Exception as err: raise block does nothing — it catches the exception and immediately re-raises it with no logging, wrapping, or side-effects. Remove it so unhandled exceptions propagate naturally from the try block.

Suggested change

except (FileNotFoundError, obs.exceptions.NotFoundError):

self._handle_missing_file(path)

except Exception as err:

raise

except (FileNotFoundError, obs.exceptions.NotFoundError):

self._handle_missing_file(path)

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

negin513 and others added 4 commits June 9, 2026 00:46

Update earth2studio/data/ufs.py

8ba2c4e

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

Merge branch 'main' into obstore-ufs-obs-fetch

07b457c

greptile-apps Bot reviewed Jun 9, 2026

View reviewed changes

NickGeneva added the ! - Release PRs or Issues releating to a release label Jun 22, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

HealDA IO Improvement: Parallel decode + Obstore#914

HealDA IO Improvement: Parallel decode + Obstore#914
negin513 wants to merge 4 commits into
NVIDIA:mainfrom
negin513:parallel-decode

negin513 commented Jun 9, 2026 •

edited

Loading

Uh oh!

greptile-apps Bot commented Jun 9, 2026

Confidence Score: 3/5

Important Files Changed

Uh oh!

greptile-apps Bot Jun 9, 2026

Uh oh!

greptile-apps Bot Jun 9, 2026

Uh oh!

greptile-apps Bot Jun 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

negin513 commented Jun 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Earth2Studio Pull Request

Description

Checklist

Dependencies

Uh oh!

greptile-apps Bot commented Jun 9, 2026

Greptile Summary

Confidence Score: 3/5

Important Files Changed

Uh oh!

greptile-apps Bot Jun 9, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Jun 9, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Jun 9, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

negin513 commented Jun 9, 2026 •

edited

Loading