Add DUACS daily->5-day-average preprocessing script#758
Draft
alxmrs wants to merge 2 commits into
Draft
Conversation
New `make_duacs_5day` utility coarsens the daily (P1D) DUACS L4 altimetry product to non-overlapping consecutive 5-day means (coarsen boundary='trim', 366 -> 73 steps), mirroring the OM4 5-daily convention so the observations can sit alongside the emulator inputs. - Keeps core ocean fields (adt, sla, ugos, ugosa, vgos, vgosa) plus flag_ice re-derived as a 0-1 ice-presence fraction; drops err_* and tpa_correction. - Writes uncompressed float32, one chunk per timestep, to the user's bucket with a P1D->P5D name. Coiled by default, local via OCEAN_DUACS_CLUSTER. - Reuses the OM4 pipeline's init_cluster, blosc single-thread guards, and retry-on-write logic. Streaming pass: native time chunk (50) is a clean multiple of the window, so there's no cross-chunk shuffle. - --dry_run validates structure + a mid-grid sample and prints the full time axis without writing. Native DUACS naming/grid preserved (not conformed to the emulator x/y/0-360 schema). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Member
Author
Dry-run output
Final dataset (
|
Both the DUACS source and the output live on OSN (no egress fees), so running the reduction on Torch HPC instead of Coiled/AWS removes the cloud bill while OSN<->Torch transfer stays free. The job is CPU-only and the script is already cluster-agnostic, so no Python changes are needed. - scripts/slurm_duacs_5day.sbatch: single CPU node, no Apptainer (uses the ocean_preprocessing mamba env directly via `mamba/conda run -n`), sources OSN creds from ~/.osn_env, runs --cluster=local sized to --cpus-per-task, streams from OSN and writes back. DRY_RUN/SRC/OUTPUT_PATH/WINDOW/ARGS overrides. - docs/torch.md: new "Data Preprocessing On Torch" section documenting the env setup, credential handling, and dry-run-then-write flow. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
New
ocean_preprocessing.make_duacs_5dayutility that coarsens the daily (P1D) global gridded DUACS L4 altimetry product to non-overlapping consecutive 5-day means, mirroring the OM4 5-daily convention so the observations can sit alongside the emulator inputs.How
coarsen(time=5, boundary="trim").mean()— drops the trailing 1-day remainder → 366 → 73 steps. Each window is labelled by its midpoint timestamp.adt, sla, ugos, ugosa, vgos, vgosa+flag_icere-derived as a 0–1 ice-presence fraction (metadata relabelled).err_*andtpa_correctiondropped.s3://emulators/am16581/data/2026-06/...P5D...zarr(P1D→P5D name swap).init_cluster, blosc single-thread guards, and retry-on-write. Coiled by default;OCEAN_DUACS_CLUSTER=localto run locally. Native time chunk (50) is a clean multiple of the window → no cross-chunk shuffle.latitude/longitude, −180…180, 0.125°) — not conformed to the emulator x/y/0–360 schema.Validation
--dry_runagainst the live store: structure correct, mid-grid sample yields physically sensible values (adt ≈ 0.43 m, sla ≈ 0.045 m, velocities m/s). Dry-run output (final dataset repr + fullds.time) to be pasted in a comment below for review.Tests
tests/test_make_duacs_5day.py— block-mean correctness, ice-flag→fraction, missing-var guard.Run:
🤖 Generated with Claude Code