Skip to content

ultimatile/crewster

Repository files navigation

crewster

A CLI that makes a remote HPC cluster feel local — sync your working tree, run a quick job (Slurm/PJM), pull results.

Why not Snakemake?

crewster is not a job orchestrator, and does not try to be one.

Its responsibility is the inner development loop against a remote environment: you are actively editing code — often before it is committed — and you want to push the current working tree to a remote HPC environment and run a quick test there, repeatedly and quickly. Snakemake, Nextflow, and similar tools assume a defined, committed pipeline and manage its execution graph; they serve the other end of the lifecycle.

They are complementary, not competing:

  • crewster — frequent pre-commit source sync, single test / verification runs, a tight edit → run → observe loop. Built for a coding agent iterating against a remote environment.
  • A workflow runner — dependency graphs, multi-step pipelines, retries, production runs.

If you need to orchestrate a complex production run, use a real orchestrator — and you can still launch it through crewster (crewster submit "snakemake ..."): crewster handles the transport (sync up, run, pull results) and treats the workflow as opaque user code. crewster owns the transport and the dev loop, never the run graph.

Installation

One-shot execution (no install):

uvx --from git+https://github.com/ultimatile/crewster crewster

Permanent install:

uv tool install git+https://github.com/ultimatile/crewster

Quick Start

# 1. Initialize project
crewster init

# 2. Edit configuration
vim crewster.toml

# 3. Sync files to cluster
crewster sync
crewster sync --dry-run  # preview only

# 4. Submit job
crewster submit "python train.py"

# 5. Check status
crewster status 12345678

# 6. View job output
crewster job-output 12345678

Commands

crewster init

Creates crewster.toml configuration file in the current directory.

crewster init                      # Slurm template (default)
crewster init --scheduler pjm      # PJM-oriented template

--scheduler selects which scheduler-specific section is written. The default is slurm.

When $XDG_CONFIG_HOME/crewster/config.toml exists, it is used as the source instead of the built-in template. See User-level XDG config for the filter-merge semantics.

crewster sync

Syncs local files to the remote HPC cluster using rsync. Always syncs the entire project root (where crewster.toml is located), regardless of which subdirectory you run from.

crewster sync                # sync files
crewster sync --dry-run      # preview without syncing (-n for short)
crewster sync --workdir /scratch/user/other   # override remote workdir
crewster sync --push         # push only (local → remote)
crewster sync --pull         # pull only (remote → local)

crewster exec

Executes a command directly on the login node (not via scheduler). Useful for setup tasks that need internet access (package installs, dependency downloads).

crewster exec "julia -e 'using Pkg; Pkg.instantiate()'"
crewster exec --script setup.sh
crewster exec --workdir /scratch/user/other "cmake .."

Environment setup ([env] section) is applied automatically. The working directory follows the same CWD-relative logic as crewster submit.

crewster submit

Submits a job to the configured scheduler. Returns both run_id (e.g., 20260109_1234, crewster's local tracking ID) and job_id (scheduler job ID, e.g., 12345678).

The job's working directory is set based on your current position relative to the project root (see Multi-Setup Runs).

crewster submit "python train.py"
crewster submit --script run.sh
crewster submit -s run.sh --wait
crewster submit --workdir /scratch/user/other "python train.py"  # override remote workdir

#SBATCH (Slurm) and #PJM (PJM) directives written at the top of a script passed via --script are honored: crewster hoists them into the prologue of the rendered job script, so they are scanned by sbatch / pjsub instead of being silently treated as comments.

Only column-zero directive lines that appear before the first executable line in the user script are hoisted, matching the schedulers' own prologue-scan rule. Directives after an executable line, or inside heredocs, are left in the body as-is.

When the same option is set both via config ([slurm.options] for Slurm, the pjm.options array for PJM) and via a #SBATCH / #PJM line in the script, the script's value wins (the scheduler's last-occurrence-wins semantics for duplicate directives). The submit_options list is passed as command-line flags to sbatch / pjsub and, per scheduler specifications, overrides script directives unconditionally.

crewster status

Checks the status of a submitted job. Accepts either run_id or job_id.

crewster status 12345678

crewster job-output

Shows the output of a submitted job. Accepts either run_id or job_id.

crewster job-output 12345678

Pass --follow / -f to stream the output of a running job in real time (equivalent to tail -F on the remote output file). Combine with --error / -e to follow stderr instead of stdout. For terminal-state jobs the command prints the final output and exits.

crewster job-output -f 12345678
crewster job-output -f -e 12345678

crewster wait

Waits for a run to complete. Accepts either run_id or job_id.

crewster wait 12345678

Project Root and Config Discovery

crewster walks up from the current directory to find crewster.toml, similar to how git finds .git. This means you can run crewster commands from any subdirectory within your project.

Resolution order: --config / -c > $CREWSTER_CONFIG > walk-up discovery > ./crewster.toml.

For backward compatibility, the legacy $HPC_CONFIG environment variable and a hpc.toml filename are still honored as a read-only fallback, with a deprecation warning printed to stderr. This fallback is removed in v1.0. If you keep a legacy hpc.toml, add .crewster to [sync] ignore_push so the run-metadata directory is not pushed to the remote.

The directory containing crewster.toml is the project root. This affects:

  • crewster sync: always syncs the entire project root to workdir, regardless of CWD
  • crewster submit: sets the job's cd to workdir + (CWD relative to project root)
  • .crewster/runs/: run metadata is always stored at the project root

crewster init does not walk up — it always creates crewster.toml in the current directory.

Multi-Setup Runs

When running multiple benchmarks or parameter sets from a single project, use subdirectories. crewster automatically maps your local directory structure to the remote.

myproject/
  crewster.toml         # workdir = "/remote/myproject"
  src/main.py
  runs/
    setup-a/
      input.dat
    setup-b/
      input.dat
# Sync the entire project (same result from any subdirectory)
crewster sync

# Submit from a subdirectory — job runs in the matching remote path
cd runs/setup-a
crewster submit "python src/main.py"
# → job cd's to /remote/myproject/runs/setup-a

cd ../setup-b
crewster submit "python src/main.py"
# → job cd's to /remote/myproject/runs/setup-b

Key points:

  • sync is always project-wide. The remote mirrors your local project structure exactly.
  • submit uses your CWD to determine the job's working directory on the remote.
  • --workdir overrides cluster.workdir for one-off use without editing crewster.toml.
  • Large artifacts that shouldn't be synced are managed via [sync] ignore.

Configuration

Edit crewster.toml:

[cluster]
host = "myhpc"                    # SSH host (from ~/.ssh/config)
workdir = "/scratch/user/proj"    # Remote working directory; all codes and data will be synced here
scheduler = "slurm"                # "slurm" (default) or "pjm"

[env]
modules = ["gcc/12.2.0", "cuda/12.2"]  # Modules to load (shorthand for module load)
spack = ["python@3.11"]                # Spack packages to load (shorthand for spack load)
setup = [                              # Additional setup commands
    {source = "/path/to/venv/bin/activate"},
    {export = ["VAR=value"]},          # {command = [args...]} format
    "some_cmd",                        # String: command without args
]

[sync]
ignore = ["crewster.toml", ".git"]  # Patterns to exclude from sync
compare = "checksum"           # File comparison: "checksum" (content-based, default) or "timestamp"
pull_dir = "~/data/myproj"     # Pull destination (default: project root). Useful for keeping git repo clean

[slurm.options]
partition = "gpu"      # Example (Slurm): partition
time = "02:00:00"      # Example (Slurm): time limit
mem = "32G"            # Example (Slurm): memory
gpus = 1               # Example (Slurm): number of GPUs

Environment Setup

Commands are executed in this order: modulesspacksetup.

modules and spack are shorthand syntax:

  • modules = ["gcc/12.2.0"] expands to module load gcc/12.2.0
  • spack = ["python@3.11"] expands to spack load python@3.11

setup accepts:

  • String: command without args (e.g., "some_cmd")
  • Dict: {command = args} format (e.g., {export = ["VAR=value"]}export VAR=value)
  • Special commands module and spack in dict format expand to module load / spack load

If you need a different execution order, put everything in setup:

[env]
setup = [
    {spack = "python@3.11"},
    {module = "gcc/12.2.0"},
    {source = "/path/to/venv/bin/activate"},
]

Shell special characters (;|&`$<>\'"\n and space) are prohibited in arguments for security.

PJM Configuration

For PJM scheduler, use array format for options:

[cluster]
scheduler = "pjm"

[pjm]
options = [
    ["-L", "node=12"],
    ["-L", "rscgrp=small"],
    ["-L", "elapse=00:30:00"],
    ["--mpi", "max-proc-per-node=4"],
    ["-g", "laa4Hoo5"],
    ["-s"]
]

User-level XDG config

$XDG_CONFIG_HOME/crewster/config.toml (default: ~/.config/crewster/config.toml), when present, is used as the source for crewster init instead of the built-in template. The file is filter-merged onto the chosen scheduler:

  • The inactive scheduler's top-level section ([pjm] under --scheduler slurm, [slurm] under --scheduler pjm) is dropped.
  • cluster.scheduler is forced to match the --scheduler argument.
  • All other sections (including unknown ones) carry over with their parsed TOML values intact.

The source XDG file is not modified. This lets the XDG file carry both [slurm] and [pjm] sections side by side so that crewster init --scheduler {slurm,pjm} projects out the active half. Because the file goes through tomllib.load and tomli_w.dump, comments and original formatting (e.g. inline-array layout) are not preserved in the generated crewster.toml; only the parsed data is.

Requirements

  • Python 3.11+
  • SSH access to HPC cluster (key-based authentication recommended)
  • rsync
  • Slurm or PJM on the remote cluster

rsync Note

rsync from https://rsync.samba.org/ is recommended over macOS's built-in openrsync. When using checksum-based comparison (compare = "checksum", default), openrsync has a bug where files with sizes that are exact multiples of 64 bytes are always detected as changed, even when identical. This is due to a protocol 29 checksum boundary issue. Confirmed with macOS 15.7's openrsync (protocol version 29, rsync version 2.6.9 compatible). If concerned, use [sync] compare = "timestamp" instead.

On macOS, install rsync via Homebrew:

brew install rsync

Claude Code Integration

This project includes a Claude Code skill (.claude/skills/crewster/SKILL.md) that teaches Claude how to use the crewster CLI. The CLI reference in the skill is dynamically generated via crewster --skill to stay in sync with the code.

Development

make test      # run tests
make lint      # run linter
make check     # run all checks

About

An automation CLI tool for HPC workflow: source code/data sync and scheduler job management (Slurm/PJM)

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors