Architecture review fixes: PTY starvation, persistence durability, remote/mobile leaks#134
Open
matej21 wants to merge 23 commits into
Open
Architecture review fixes: PTY starvation, persistence durability, remote/mobile leaks#134matej21 wants to merge 23 commits into
matej21 wants to merge 23 commits into
Conversation
normalize() is the canonical cleanup run on every workspace load and after every split/move, but it never clamped Tabs.active_tab against children length, letting a stale/out-of-range index leak through to downstream children[active_tab] indexing and panic. Enforce the invariant at this single choke point. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
save_settings_locked wrote the temp file with fs::write and renamed without an fsync, so on a crash/power-loss the rename could be ordered before data hit stable storage, leaving a zero-length or stale settings.json. Mirror the workspace save path: File::create + write_all + sync_all() before rename. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
save_session and export_workspace wrote serialized JSON directly with fs::write, so a crash or disk-full mid-write left truncated JSON at the canonical path with no tmp+rename and no backup; save_session overwrites in place, so an interrupted save destroyed the prior good copy. Route both through a shared atomic_write_json helper (tmp + fsync + rename). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…e-based .bak Two durability fixes in save/load_workspace: 1. (high) The empty-workspace guard set LOADED_FROM_DEFAULT=true and only ever cleared it on a successful load, so deleting the last project permanently disabled ALL persistence for the session (silently, save returned Ok) and reverted on next launch. Stop latching on empty save: only skip an empty save when we already loaded from a fallback default this session (startup glitch); a genuine user delete after a successful load now persists. Layer 1 still protects the disk on real load failures. 2. (medium) The rolling .bak was a non-atomic fs::copy that a crash could truncate (and recovery restores from .bak). Back up via atomic rename instead, and guard the corrupt-deserialize path so it stashes to .corrupt rather than clobbering a .bak that is now the sole surviving good copy. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
close_terminal hand-rolled the remove-child/collapse-parent/clamp-active_tab logic that LayoutNode::remove_at_path already implements (and tests_simulate had a third copy). The three copies had drifted, so the well-tested remove_at_path_* unit tests didn't actually guard the shipped close path. Delegate the structural mutation to remove_at_path; keep only the close-specific concerns in close_terminal: the last-pane -> bookmark/None-layout branch, and the active_tab focus-preservation delta (keep the active tab on the same content when closing a tab before it) which remove_at_path doesn't do. simulate_close now delegates too. Adds coverage for the previously-untested close-before-active-tab case. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The state_changed handler built a fresh reqwest::Client::new() on every event, discarding the HTTP connection pool/keepalive per discrete server change. Reuse the session-scoped client already constructed in ws_session (reqwest::Client is Arc-internally, so it shares the pool). Request/auth/diff logic unchanged. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…re parse Two concurrency fixes in the central PTY event loop on the GPUI thread: 1. (high) The loop drained the whole bounded channel and ran the full ANSI/VTE parse synchronously for every event before returning to the executor, with no yield or work budget. A single high-bandwidth terminal (cat hugefile, yes, runaway build log) kept the UI thread inside the drain loop, starving input, render, resize and parsing for every other terminal. Add a per-turn byte budget (256 KiB) and smol::future::yield_now() so the executor schedules input/render between chunks; undrained events stay queued and progress next turn (nothing dropped). 2. (medium) process_output ran while the terminals-registry mutex was held, so a keystroke/resize/kill on a different terminal blocked behind a large parse. Clone the Arc<Terminal> out and drop the registry lock before parsing. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ding UI hint The remote server has no TLS (plain http/ws). When bound beyond loopback the bearer token and entire PTY stream (passwords, SSH keys, typed secrets) cross the network in cleartext, and the settings UI casually suggested 0.0.0.0 with no warning. Emit a prominent log::warn! on any non-loopback bind, and reword the UI hint to spell out the unencrypted-exposure risk. Non-breaking: defaults and bind behavior unchanged. Full TLS (wss + cert pinning in pairing) is a follow-up. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
add_connection always minted a fresh UUID and inserted a MobileConnection, but nothing ever removed entries: disconnect() only marked status and cleared the state cache, leaving the RemoteClient, terminals grid, and event task alive forever. Every connect/reconnect/re-pair accumulated a new entry — an unbounded leak of tasks and alacritty grids on a memory-constrained device. Add an idempotent remove_connection (tears down WS task + event task, drops the entry), and have add_connection replace any existing entry for the same host:port before inserting. Locks are released before teardown to avoid deadlock; fresh-UUID contract to Dart preserved. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ate table Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…last active tab close_terminal's focus-preservation step decremented the post-removal active_tab unconditionally. But remove_at_path already clamps active_tab when it ran past the end — which happens precisely when the active tab was the LAST one — so closing a tab before a last active tab decremented twice and landed focus one tab too far left (e.g. Tabs [t1,t2,t3] active=t3, close t1 -> focus jumped to t2 instead of staying on t3). Capture the original active_tab before removal and set the new index absolutely (prev_active - 1, clamped) instead of decrementing the already-clamped value. Adds a regression test for the last-active-tab case (the existing test only covered a middle active tab, where the clamp doesn't fire). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ned) Stage 1 of remote TLS: server side. Adds an opt-in setting remote_tls_enabled (default off, backward-compatible). When on, the server loads (or generates+persists) a self-signed cert in the config dir (remote_cert.pem / remote_key.pem) and serves axum over rustls via axum-server; the cert is reused across restarts so the pinned fingerprint stays stable. The cert's SHA-256 fingerprint (lowercase hex) is computed, logged at startup, exposed via RemoteInfo, and written into remote.json (tls flag) so clients and the UI can surface it for out-of-band verification during pairing. Cert setup failure does NOT silently fall back to plaintext — the server stays off so TLS can't be silently downgraded. Non-loopback-without-TLS still warns loudly; the warning now points users at the TLS setting. Follow-up stages: okena-core client pinned verifier (TOFU + pin), desktop pairing/connect UI for fingerprint verification, mobile FFI plumbing. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…d client Stage 2 of remote TLS: the okena-core client (used by both desktop and mobile). Adds a PinnedCertVerifier (rustls 0.23 ServerCertVerifier) that trusts the self-signed server cert purely by its pinned SHA-256 fingerprint: None pin => trust-on-first-use (accept + record), Some pin => require exact match (defeats MITM). Handshake signatures are still verified against the presented cert's key via the aws_lc_rs provider, so replaying the pinned cert without its private key fails. A single rustls ClientConfig drives both reqwest (use_preconfigured_tls) and tokio-tungstenite (Connector::Rustls), so http+ws share one verifier. RemoteConnectionConfig gains `tls` and `pinned_cert_sha256` (both serde-default for back-compat) plus base_url()/ws_url() scheme helpers. All connection paths (health, pair, ws_session, state fetch, token refresh) now route through the TLS-aware builders. The fingerprint observed during the pairing handshake rides back on TokenObtained.cert_fingerprint; the desktop and mobile managers persist it as the pin. Pure pin-decision is unit-tested (TOFU capture, match, mismatch, case-insensitive). Follow-up: desktop connect/pair UI (TLS toggle + fingerprint verification), mobile FFI + flutter_rust_bridge regen. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…rint display Stage 3 of remote TLS: desktop UI. Server side (settings): add an "Encrypt with TLS" toggle under Remote Server (set_remote_tls_enabled) and, when on, display the running server's cert fingerprint (SHA-256, colon-grouped) read from RemoteInfo so the user can read it out to verify against a pairing client. Client side (connect dialog): add an "Encrypt" toggle (use_tls). The dialog's own health-check + pair requests now route through okena-core's TLS-aware client with the correct https scheme; on a TLS pair it captures the cert fingerprint observed during the handshake (TOFU) and pins it onto the saved RemoteConnectionConfig before emitting Connected, so every later connect enforces the pin. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Stage 4 of remote TLS: mobile Rust. add_connection now takes a `tls` flag and sets it on the RemoteConnectionConfig, and the TokenObtained handler pins the cert fingerprint captured during the handshake (already in stage 2). The whole native stack is TLS-capable. The FFI `connect()` still hardcodes tls=false: exposing the toggle changes the FFI signature, which requires `flutter_rust_bridge_codegen generate` (Flutter toolchain) plus a Dart toggle + fingerprint-verification UI. Left as a marked TODO so the build stays green without the codegen pass. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Real rustls server using our generated self-signed cert, against the real okena-core pinned client. Asserts the three behaviours the desktop→desktop flow relies on: 1. TOFU (no pin) connects and captures exactly the server's fingerprint 2. correct pin connects 3. wrong pin is rejected (MITM / cert-swap defense) Verifies cert generation, axum-server rustls serving, and the client PinnedCertVerifier interoperate over an actual handshake — the part that can't be exercised by the pure unit tests. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ent) Closes the first-connect MITM window in the desktop client. After a TLS pair the dialog no longer silently trusts the captured cert: it holds the config back, shows the captured fingerprint (colon-grouped), and asks the user to verify it matches the host (Settings → Remote Server) before the same primary button — now "Confirm & Connect" — emits Connected and persists the pin. Cancel discards the pending config so nothing is saved. Plain-http pairs are unaffected (nothing to verify, connect immediately). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…-TLS) remote_tls_enabled now defaults to true in AppSettings::default(), so a fresh install is secure-by-default once the remote server is enabled. Existing settings.json files predate the key and serde's per-field #[serde(default)] fills them with false, so already-configured plain-http users are NOT silently switched to TLS (which would break their existing clients). Covered by a test. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…tp fallback)
Replaces the manual Encrypt toggle with auto mode: the dialog probes /health
over https first, then plain http, and adopts whichever responds — preferring
the secure scheme while still connecting to existing plain-http servers. The
detected scheme is written onto the saved config (config.tls), so reconnects
use it. Test Connection reports which scheme answered ("…, TLS" vs
"…, plaintext"). On a TLS detect, the pairing still captures + verifies the
cert fingerprint before pinning.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…onnections Plain-http connections now show a "⚠ no TLS" badge in the sidebar's remote list. Their right-click menu gains an "Upgrade to TLS" item (hidden once a connection already uses TLS) which flips the saved connection's tls flag (and clears any stale pin) via RemoteConnectionManager::set_connection_tls, then opens the pair dialog — the re-pair runs over TLS and pins the server cert. Threads the connection's tls state through the context-menu request so the menu can decide whether to offer the upgrade. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ntHello) Replaces axum-server with a custom accept loop that peeks the first byte of each connection (0x16 = TLS ClientHello) and routes it to either the rustls acceptor or plain HTTP, serving both on the same port via hyper-util's auto builder. This means enabling TLS no longer breaks already-paired plain-http clients — they keep working while new/auto clients negotiate TLS — so TLS can become the default without a flag-day migration. ConnectInfo (peer IP, used by pairing rate-limit) is injected per connection. Graceful shutdown is driven by the same watch channel. tls.rs gains server_config() building a rustls ServerConfig from the persisted cert. The integration test now drives the real dual-stack server and asserts plain http, pinned https (TOFU capture), and wrong-pin rejection all on one port. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…n by default Client (okena-core RemoteClient::connect) now prefers TLS: a pinned/TLS connection only tries TLS (no downgrade), while a legacy plain connection probes https first and falls back to http. When a previously-plain connection reaches the server over TLS it auto-upgrades — adopts TLS, pins the cert (TOFU), and emits the new ConnectionEvent::TlsUpgraded so the desktop and mobile managers persist tls=true + the pin (sidebar warning clears, pin enforced next time). With the dual-stack server in place this is safe, so remote_tls_enabled now defaults to ON for existing configs too (serde default = true), not just fresh installs. Existing plain-http clients keep working (server still accepts http) and transparently upgrade to TLS on their next connect. Explicit opt-out is still honored. Test updated. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
CI's check job runs clippy with -D warnings; expect_used is denied. Annotate the one infallible expect (aws_lc_rs always supports the default protocol versions) with the project's #[allow(..., reason)] convention. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
b3a8e86 to
3fd4112
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Architecture review fixes + remote server TLS
Implements the High findings and the highest-value low-risk Medium findings from a 10-area parallel architecture review, and turns the originally-deferred remote TLS item into a full implementation (server + desktop client + mobile Rust). Each fix is an isolated commit; full workspace builds, clippy (
-D warnings) is clean, and all touched test suites pass (okena-layout56,okena-workspace276,okena-coreclient 71, plus a new TLS handshake integration test).High
src/app/mod.rs) — the central PTY event loop drained the whole bounded channel and ran the full ANSI/VTE parse synchronously on the GPUI thread with no yield, so one high-bandwidth terminal (cat hugefile,yes, a runaway build log) starved input/render/resize and every other terminal. Added a 256 KiB per-turn byte budget +yield_now(); undrained events stay queued (nothing dropped). Also drops the terminals-registry lock before parsing (a keystroke/resize/kill on another terminal no longer blocks behind a large parse).persistence.rs) — deleting the last project setLOADED_FROM_DEFAULTand permanently disabled all saves for the session (silently,savereturnedOk), reverting on next launch. Empty-save is now skipped only when we actually loaded from a fallback default this session; a genuine user delete persists.settings.rs) — addedfsyncbefore the atomic rename (mirrors the workspace save path).actions/layout/close.rs) — close hand-rolled the remove/collapse/clamp logic thatLayoutNode::remove_at_pathalready implements (andtests_simulatehad a third copy), so the well-testedremove_at_path_*unit tests didn't guard the shipped path. Delegated toremove_at_path; kept the last-pane→bookmark branch and the active-tab focus-preservation delta. Follow-up fix: the focus-preservation step double-decrementedactive_tabwhen closing a tab before a last active tab (it overlappedremove_at_path's own clamp), landing focus one tab too far left — now sets the index absolutely, with a regression test for the last-active-tab case.remote/server.rs, settings UI) — the remote server had no TLS; bound beyond loopback, the token + entire PTY stream (passwords, SSH keys) crossed the network in cleartext. Shipped as a loudwarn!+ reworded UI hint first, then fully fixed by the TLS work below.mobile/native/src/client/manager.rs) — connections were never removed from the manager map, leakingRemoteClients, alacritty grids, and event tasks on every reconnect/re-pair. Addedremove_connectionand replace-by-host:port inadd_connection.Medium
active_tabclamp —normalize()now clampsTabs.active_tab(the single choke point) so a stale/out-of-range index can't reachchildren[active_tab]and panic.sessions.rs) —save_session/export_workspaceused non-atomicfs::write; routed through a sharedtmp + fsync + renamehelper..bak(persistence.rs) — rolling backup was a non-atomicfs::copya crash could truncate; now an atomic rename, with the corrupt-deserialize path guarded so it can't clobber the sole surviving good copy.okena-coreremote client) — stop building a freshreqwest::Clientperstate_changedevent; reuse the session client (shares the connection pool).okena-ext-githubto the crate table.Remote server TLS (https/wss + certificate pinning)
Encrypts the remote control channel end-to-end. Trust model is self-signed cert + Trust-On-First-Use pinning with out-of-band fingerprint verification (SSH-style) — appropriate for a LAN tool with no CA. Rolled out secure-by-default without breaking existing plain-http setups.
src/remote/tls.rs,serve.rs,server.rs) — generates and persists a self-signed cert in the config dir (remote_cert.pem/remote_key.pem), reused across restarts so the pinned fingerprint stays stable. Serving is dual-stack on a single port: a custom accept loop peeks the first byte (0x16= TLS ClientHello) and routes each connection to either the rustls acceptor or plain HTTP (via hyper-util's auto builder). So enabling TLS does not break already-paired plain-http clients — they keep working while new/auto clients negotiate TLS. The cert's SHA-256 fingerprint is logged, surfaced in Settings, and written toremote.json.okena-core— shared by desktop + mobile) — a rustlsPinnedCertVerifiertrusts the server cert purely by its pinned SHA-256 (None→ TOFU capture,Some→ exact match, no downgrade), while still verifying the handshake signature against the presented key (aws_lc_rs). OneClientConfigdrives both reqwest and tokio-tungstenite.RemoteConnectionConfiggainstls+pinned_cert_sha256(serde-default, back-compatible). On connect the client auto-detects scheme (https-first, http fallback) and auto-upgrades a previously-plain connection to TLS, pinning the cert and persisting it.remote_tls_enableddefaults on (including for existing configs missing the key); safe because the server is dual-stack. Explicit opt-out is honored. Verified across a client × server × version × settings compatibility matrix; the key path (new TLS server × old/plain client on one port) is covered by an integration test that drives the real dual-stack server and asserts plain http, pinned https (TOFU capture), and wrong-pin rejection.tls+ pins the cert); exposing the toggle in the Flutter UI needs aflutter_rust_bridgecodegen pass and is left as a marked TODO.Deferred (flagged, larger follow-ups)
PROTO_VERSIONnegotiation + typedWsOutboundparsing on the client;refresh_windows()invalidation scoping;WorkspaceData.projectsVec→index; extension failure-isolation; enablingclippy::unwrap_usedworkspace lint; mobile Flutter TLS toggle (Rust side ready, needs codegen + UI).🤖 Generated with Claude Code