Skip to content

feat(server): plan for auth and read-only mode (#312)#334

Open
le-yams wants to merge 4 commits into
cooklang:mainfrom
le-yams:feat/server_auth_and_read_only_mode
Open

feat(server): plan for auth and read-only mode (#312)#334
le-yams wants to merge 4 commits into
cooklang:mainfrom
le-yams:feat/server_auth_and_read_only_mode

Conversation

@le-yams

@le-yams le-yams commented Apr 29, 2026

Copy link
Copy Markdown

Hello there,

This PR addresses cooklang/cookcli#312 — making the cook server write operations require authentication, while keeping anonymous read access available by default.

⚠️ This PR currently contains only the design/plan document — no implementation yet. I'm opening it as a draft to gather feedback on the approach before I start coding. Once the plan is validated I'll push the implementation in follow-up commits on this same branch.

What's in here

A single new file: docs/plans/2026-04-29-server-auth-plan.md, structured as:

  • Threat model & non-goals
  • Configuration format (server.toml with mandatory plain: / bcrypt: prefixed password) and an extensible PasswordVerifier trait with a plain Password enum (extended later by adding a variant)
  • CLI surface (--auth, --no-auth, --auth-config --enable-auth, --server-config, plus cook server hash-password subcommand)
  • Three AuthMode states (Authenticated / ReadOnly / Disabled) and a resolution table Two AuthMode states (Authenticated / Disabled) and a resolution table; --enable-auth without credentials is a startup error rather than a third ReadOnly variant
  • Server-side persistent sessions (HashMap + JSON file, 7-day TTL)
  • Full read/write classification of every API and UI route
  • Frontend changes (template AuthContext, hidden write actions, login page)
  • A 7-phase implementation breakdown A 6-phase implementation breakdown where each phase pairs code with the tests that lock its behaviour
  • Definition of done

Notable design decisions (vs. the issue)

  • Default mode is Disabled (legacy behavior preserved with a console warning) instead of ReadOnly, to avoid a breaking change for existing users. Switching to read-only requires explicit --auth Enabling protection requires explicit --enable-auth together with credentials in server.toml; passing the flag alone aborts at startup.
  • Mandatory algorithm prefix on the password field (no auto-detection) to remove ambiguity.
  • Local auth and CookCloud sync stay separate: the existing /api/sync/* flow is unchanged, just protected by the new local auth.

Looking for feedback on

  • Overall approach and architecture
  • The default-mode decision (Disabled vs strictly following the issue's ReadOnly)
  • Route classification — anything I'm miscategorizing?
  • Whether the PasswordVerifier trait abstraction is overkill for shipping just plain + bcrypt dropped the trait in favor of a plain Password enum
  • Any prior art / preferences in the codebase I might have missed

Disclosure

I'm new to Rust, so I'm using Claude Code to help me explore the codebase, design the plan, and (later) write idiomatic Rust. All design decisions and the final wording of this plan are my own — I've reviewed everything carefully — but it's only fair to flag that a coding assistant is in the loop.

le-yams and others added 3 commits April 30, 2026 17:18
Replace the prior design doc (server-auth-readonly-design.md) with a
single forward-looking plan where each phase pairs code with the tests
that lock its behavior. Reflects the final CLI (--enable-auth as the
source of truth, --server-config, mandatory plain:/bcrypt: password
prefix) and a deliberately small architecture — two modes
(Authenticated / Disabled), --enable-auth without creds errors out at
startup, password verifier as an enum, require_auth mounted before
the /api nest to sidestep OriginalUri pitfalls.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@le-yams le-yams force-pushed the feat/server_auth_and_read_only_mode branch from 181bc41 to a38c29e Compare April 30, 2026 16:10
@le-yams

le-yams commented Apr 30, 2026

Copy link
Copy Markdown
Author

Hello again,

I've reworked the plan to fold the test strategy in and simplify the CLI.

CLI simplification: Single --enable-auth flag as the source of truth (drops --auth / --no-auth); --server-config (was --auth-config); presence of server.toml alone never activates auth.

Architecture simplification: Two modes only — Authenticated / Disabled; --enable-auth without creds is a startup error rather than a third ReadOnly variant. Password is a plain enum (Plain / Bcrypt) instead of a trait + factory. require_auth is mounted before the /api nest so req.uri() works directly, removing the OriginalUri regression test. Session TTL hardcoded to 7 days; hash-password is bcrypt-only; auth strings stay inline (no .ftl churn until a translation lands).

@romainhild

Copy link
Copy Markdown
Contributor

Hello, I'm really interested in this since I'm not feeling good about letting anyone write on my server or delete recipes.
I'm no expert but your design looks good. I think it is important that by default the server keep its current behavior.
It is nice to add the possibility to keep only the password encrypted and nicer to add the cli to do it.

When do you think you will be able to implement this ? (No pressure intended 😄 )

@romainhild

Copy link
Copy Markdown
Contributor

For info, I asked Claude to review the plan, here is his comments:


Strengths

  • --enable-auth as sole source of truth is a clean design that avoids the "stale config surprises" problem. The four-row resolution table in §4.2 is
    unambiguous.
  • Phased approach is excellent — each phase leaves the tree green and adds testable behavior incrementally.
  • Security checklist (§10) covers the fundamentals correctly: constant-delay login, generic error messages, 256-bit tokens, CSRF on all non-GET writes.
  • ServerSpawn builder pattern is more maintainable than the alternative of four spawn_* functions.
  • COOK_SESSION_FILE env var for test hermeticity is the right approach — avoids hidden CLI flags.
  • bcrypt cost=4 in tests is explicitly called out — prevents a common CI slowness trap.

Issues Worth Fixing

  1. AuthConfig stores password: String but Password enum is never stored

The AuthConfig struct holds the raw prefixed string:
pub struct AuthConfig {
pub username: String,
pub password: String, // raw prefixed value, parsed at startup
}

But AuthState.config: Option means the password is never the parsed Password enum at rest — verify() would need to re-parse on every login
attempt. Either:

  • Store the parsed Password enum instead of (or alongside) the raw string in AuthState, or
  • Make clear that build_state calls Password::parse() and stores the result separately.

This is the biggest ambiguity in the architecture section.

  1. Port TOCTOU race in pick_free_port

"Bind to :0, read the port, drop the listener, hand the port to the child process" — between dropping the listener and the child binding, another process can
grab the port. This causes intermittent CI failures in parallel test runs. If the server supports --port 0 (which Axum/tokio easily can), have the child
bind to :0, then read the actual port from stdout. This avoids the race entirely.

  1. Session file not isolated by default in tests

The with_session_path builder method sets a custom path, but the default isn't specified. If tests don't call with_session_path, they may share the
developer's real ~/.config/cook/server-sessions.json. The same hermetic logic applied to --server-config should apply to the session file: set
COOK_SESSION_FILE to a temp path by default in ServerSpawn::spawn().

  1. sanitize_next test coverage is incomplete

The plan tests next=//evil.com open redirect, but doesn't cover:

  • next=%2F%2Fevil.com (URL-encoded double-slash)
  • next=http:// (explicit scheme)
  • next=// (protocol-relative, no host yet)

These are the common bypass vectors for open-redirect guards. Add them to sanitize_next_basic.

  1. Timing test (login_constant_delay) is flaky by nature

"both paths ≥ 250 ms" — this will occasionally fail on a loaded CI machine that takes > 250 ms on unrelated overhead, or pass when the server is slow for
unrelated reasons. Consider either:

  • Only asserting the delay is >= threshold (not an exact window), and marking the test slow/flaky-tolerant in CI, or
  • Testing the behavior (constant delay regardless of valid/invalid user) without asserting a specific wall-clock time.

Minor Points

  • /api/reload as read (§5.1): Allows anonymous users to trigger cache refreshes. This is a low-severity DoS vector on large recipe collections. Not a
    blocker, but worth a note in the threat model.
  • Session lazy cleanup: In-memory sessions are only purged on access/load. If the server runs for months without restarts, the in-memory map and JSON file
    can grow. Fine for single-user, but should be noted as a known limitation (like rate-limiting is).
  • AuthContext field auth_enabled: Could be a derived method instead of a bool field — mode == AuthMode::Authenticated. Reduces risk of them getting out of
    sync if AuthMode is changed.
  • Phase 6 (docs) has no test: Add at least a --help check for the new CLI flags to the integration suite. Docs can drift; CLI help output is always in sync.

Verdict

The plan is implementable as-is. The Password storage ambiguity (issue # 1) and session file isolation (issue # 3) should be clarified before Phase 1 begins,
or they'll surface as bugs during implementation. The rest are polish items.

@le-yams

le-yams commented Jun 17, 2026

Copy link
Copy Markdown
Author

Hello @romainhild ,

When do you think you will be able to implement this ? (No pressure intended 😄 )

It is my first contribution to this repository (and, as I said, I'm beginner with rust) so I was waiting for feedback and comments from people familiar with the codebase 🙂.

For info, I asked Claude to review the plan, here is his comments

I'll take a look at it and adjust the plan accordingly, thank you for your feedback 🙏.

le-yams pushed a commit to le-yams/cookcli that referenced this pull request Jun 17, 2026
Address the review on PR cooklang#334:
- store the parsed Password in AuthState (fail-fast at startup, no re-parse)
- default COOK_SESSION_FILE to a temp path in tests for hermeticity
- discover the test port via `--port 0` + stdout (drop the racy pick_free_port)
- make the login delay injectable (COOK_LOGIN_DELAY_MS) for a deterministic
  constant-delay test
- add the encoded `%2F%2Fevil.com` open-redirect vector to sanitize_next tests
- model AuthContext as an enum so illegal auth states are unrepresentable
- note the wildcard CORS / cookie-auth interaction and the /ws/lsp CSRF caveat
- scope fail-fast password parsing to --enable-auth only
- document session-store growth as a known limitation
- add a `cook server --help` assertion in Phase 6

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@le-yams le-yams force-pushed the feat/server_auth_and_read_only_mode branch from f546ea0 to 3f106c9 Compare June 17, 2026 15:30
@le-yams

le-yams commented Jun 17, 2026

Copy link
Copy Markdown
Author

Follow-up to my earlier reply, this lands with the commit that revises the plan, so most points are now in the diff. A few things worth reading beyond it:

  • Timing test (#5): a lower-bound assertion (≥ 250 ms) on a constant sleep can't false-fail (extra CI overhead only pushes it higher so it's weak rather than flaky). That said, your suggestion to test the constant-delay behavior was worth taking, so I've made the delay injectable and the test now asserts it deterministically 🙂.
  • New points I surfaced while applying the changes:
    • the existing wildcard CorsLayer coexisting with cookie auth;
    • /ws/lsp is a GET upgrade that the non-GET Origin check skips (CSWSH, covered by SameSite=Lax);
    • fail-fast password parsing scoped to --enable-auth only, so a stale server.toml can't crash a legacy start.
  • Minors:
    • session-store growth (lazy purge, no GC) noted as a known limitation;
    • Phase 6 gains a cook server --help assertion so the flag surface can't drift;
    • took your auth_enabled point further: AuthContext is now an enum (Disabled | Anonymous | SignedIn { username }) built in one place, so the illegal state combinations are unrepresentable by construction rather than guarded by tests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants