Colab quickstart#722
Conversation
|
@codex, can you review? |
61ec57c to
4046f8c
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 61ec57c1f3
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| if src not in sys.path: | ||
| sys.path.insert(0, src) | ||
|
|
||
| import ocean_emulators # noqa: F401 (smoke import) |
There was a problem hiding this comment.
Enforce Python 3.12 before importing project code
The quickstart install cell imports ocean_emulators unconditionally, but it never verifies the interpreter version first. This repository requires Python >=3.12 (see pyproject.toml), and the codebase already uses 3.12-only syntax (for example, type aliases in src/ocean_emulators/constants.py), so users on a Colab runtime that is still Python 3.11 will fail here with a SyntaxError before training even starts. Add a sys.version_info guard (with clear upgrade instructions) before this import to prevent the notebook from breaking at setup time.
Useful? React with 👍 / 👎.
🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> Co-authored-by: Codex <codex@openai.com>
4046f8c to
3fde428
Compare
alxmrs
left a comment
There was a problem hiding this comment.
I ran out of time yesterday to complete a full review. Here are my pending notes to minimize blocking you.
| "## 2. Slim install\n", | ||
| "\n", | ||
| "Ocean Emulator's full dependency set includes a few packages that are heavy or\n", | ||
| "brittle on Colab (`xesmf` needs the ESMF system library; `skypilot`,\n", |
There was a problem hiding this comment.
This is good for now, but in the future, I think we should clean up our dependencies in our pyproject.toml
| "\n", | ||
| "`scripts/clone_data.py` pulls from the public OSN bucket\n", | ||
| "`https://nyu1.osn.mghpcc.org/m2lines-pubs/Samudra/` — no credentials needed.\n", | ||
| "\n", |
There was a problem hiding this comment.
This is the wrong data source! I'm about to change, this. Let's make sure this is fixed before we land this PR!
| " subprocess.run(\n", | ||
| " [\n", | ||
| " \"python\",\n", | ||
| " f\"{REPO_DIR}/scripts/clone_data.py\",\n", | ||
| " str(DATA_DIR),\n", | ||
| " \"--time_start\", \"0\",\n", | ||
| " \"--time_end\", \"290\",\n", | ||
| " \"--write_time_chunks\", \"1\",\n", | ||
| " ],\n", | ||
| " env={**os.environ, \"PYTHONPATH\": f\"{REPO_DIR}/src\"},\n", | ||
| " check=True,\n", |
There was a problem hiding this comment.
I would recommend against subprocess and instead use ! at the front of the line of the colab notebook to run shell scripts.
| "source": [ | ||
| "## 4. Train\n", | ||
| "\n", | ||
| "`configs/quickstart/train.yaml` is the same shape as the production v2 config\n", |
There was a problem hiding this comment.
Figuring out this config is the hard part; thanks for finding a good small model setup.
| "id": "next-steps", | ||
| "metadata": {}, | ||
| "source": [ | ||
| "## Where to go next\n", |
|
|
||
| # T4 (Turing, sm_75) does not support bfloat16; the quickstart config disables | ||
| # it. A100/L4/H100 do support it — feel free to flip use_bfloat16 back on. | ||
| print(subprocess.run(["nvidia-smi"], capture_output=True, text=True).stdout) |
There was a problem hiding this comment.
I recommend using the ! commands in colab to run things on the shell.
|
I uploaded this into google colab and one error I hit so far is that we cannot clone this repo because it is private. |
|
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
Summary
Adds a browser-based Google Colab quickstart for Ocean Emulator.
The quickstart trains a small Samudra model on a public 1° OM4 data slice, using a slim dependency set and no S3 credentials. It is intended as an onboarding smoke test for new users and contributors.
Changes
notebooks/quickstart.ipynbscripts/build_quickstart_notebook.pyso the notebook can be regenerated from plain Python cell sourcesconfigs/quickstart/requirements-quickstart.txtfor Colab-friendly dependenciesREADME.mdTesting
UV_CACHE_DIR=/tmp/uv-cache uv run pytest tests/test_quickstart_config.py -q -m 'not cuda'UV_CACHE_DIR=/tmp/uv-cache uv run python scripts/build_quickstart_notebook.pyUV_CACHE_DIR=/tmp/uv-cache uv run python -m py_compile scripts/build_quickstart_notebook.py tests/test_quickstart_config.pygit diff --checkNotes
I have not yet run the full notebook in a live Colab runtime.