Skip to content

Colab quickstart#722

Open
YuanYuan98 wants to merge 2 commits into
mainfrom
colab-quickstart
Open

Colab quickstart#722
YuanYuan98 wants to merge 2 commits into
mainfrom
colab-quickstart

Conversation

@YuanYuan98

Copy link
Copy Markdown
Collaborator

Summary

Adds a browser-based Google Colab quickstart for Ocean Emulator.

The quickstart trains a small Samudra model on a public 1° OM4 data slice, using a slim dependency set and no S3 credentials. It is intended as an onboarding smoke test for new users and contributors.

Changes

  • Add notebooks/quickstart.ipynb
  • Add scripts/build_quickstart_notebook.py so the notebook can be regenerated from plain Python cell sources
  • Add quickstart Samudra config under configs/quickstart/
  • Add requirements-quickstart.txt for Colab-friendly dependencies
  • Add a quickstart smoke test against the mock OM4 fixture
  • Add a Colab badge/link to README.md

Testing

  • UV_CACHE_DIR=/tmp/uv-cache uv run pytest tests/test_quickstart_config.py -q -m 'not cuda'
  • UV_CACHE_DIR=/tmp/uv-cache uv run python scripts/build_quickstart_notebook.py
  • UV_CACHE_DIR=/tmp/uv-cache uv run python -m py_compile scripts/build_quickstart_notebook.py tests/test_quickstart_config.py
  • git diff --check

Notes

I have not yet run the full notebook in a live Colab runtime.

@YuanYuan98

Copy link
Copy Markdown
Collaborator Author

@codex, can you review?

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 61ec57c1f3

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

if src not in sys.path:
sys.path.insert(0, src)

import ocean_emulators # noqa: F401 (smoke import)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Enforce Python 3.12 before importing project code

The quickstart install cell imports ocean_emulators unconditionally, but it never verifies the interpreter version first. This repository requires Python >=3.12 (see pyproject.toml), and the codebase already uses 3.12-only syntax (for example, type aliases in src/ocean_emulators/constants.py), so users on a Colab runtime that is still Python 3.11 will fail here with a SyntaxError before training even starts. Add a sys.version_info guard (with clear upgrade instructions) before this import to prevent the notebook from breaking at setup time.

Useful? React with 👍 / 👎.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-authored-by: Codex <codex@openai.com>

@alxmrs alxmrs left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ran out of time yesterday to complete a full review. Here are my pending notes to minimize blocking you.

"## 2. Slim install\n",
"\n",
"Ocean Emulator's full dependency set includes a few packages that are heavy or\n",
"brittle on Colab (`xesmf` needs the ESMF system library; `skypilot`,\n",

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is good for now, but in the future, I think we should clean up our dependencies in our pyproject.toml

"\n",
"`scripts/clone_data.py` pulls from the public OSN bucket\n",
"`https://nyu1.osn.mghpcc.org/m2lines-pubs/Samudra/` — no credentials needed.\n",
"\n",

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the wrong data source! I'm about to change, this. Let's make sure this is fixed before we land this PR!

Comment on lines +183 to +193
" subprocess.run(\n",
" [\n",
" \"python\",\n",
" f\"{REPO_DIR}/scripts/clone_data.py\",\n",
" str(DATA_DIR),\n",
" \"--time_start\", \"0\",\n",
" \"--time_end\", \"290\",\n",
" \"--write_time_chunks\", \"1\",\n",
" ],\n",
" env={**os.environ, \"PYTHONPATH\": f\"{REPO_DIR}/src\"},\n",
" check=True,\n",

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would recommend against subprocess and instead use ! at the front of the line of the colab notebook to run shell scripts.

"source": [
"## 4. Train\n",
"\n",
"`configs/quickstart/train.yaml` is the same shape as the production v2 config\n",

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Figuring out this config is the hard part; thanks for finding a good small model setup.

"id": "next-steps",
"metadata": {},
"source": [
"## Where to go next\n",

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice note.


# T4 (Turing, sm_75) does not support bfloat16; the quickstart config disables
# it. A100/L4/H100 do support it — feel free to flip use_bfloat16 back on.
print(subprocess.run(["nvidia-smi"], capture_output=True, text=True).stdout)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recommend using the ! commands in colab to run things on the shell.

@alxmrs

alxmrs commented May 4, 2026

Copy link
Copy Markdown
Member

I uploaded this into google colab and one error I hit so far is that we cannot clone this repo because it is private.

@review-notebook-app

Copy link
Copy Markdown

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants