opt(#219): dissolve the inlined seam — windowed SROA + carrier-forwarding + narrow-local by avrabe · Pull Request #252 · pulseengine/loom

avrabe · 2026-07-01T21:11:16Z

Closes #219.

Brings the fully-built seam-dissolution optimization up to date with main (v1.1.17) and certifies it via the #238 behavioral differential gate — the agreed substitute for the single-board G474RE silicon gate it was previously parked on.

What it does

Windowed SROA + instruction-level carrier-forwarding + narrow-local analysis dissolve the void-return seam left by inlining. On the gale repro (sem.loom.wasm) the seam collapses 5254 → 1888 bytes.

Brought up to date with main (merge, one resolution pass)

verify.rs — no conflict; opt: dissolve the seam fully — post-inline SROA/scalar-forwarding eliminates the u64 ABI pack/unpack (+ wasm-local mem2reg) → toward LLVM-LTO parity #219's branch and main's unify partial-width load/store encoding produced byte-identical files, so there was no carrier-modeling-vs-partial-width ambiguity.
loom-shared/src/lib.rs — kept opt: dissolve the seam fully — post-inline SROA/scalar-forwarding eliminates the u64 ABI pack/unpack (+ wasm-local mem2reg) → toward LLVM-LTO parity #219's I64ExtendI32U & M → extend mask-collapse arm (pure addition, Z3-justified).
loom-cli/src/main.rs — kept both new passes (opt: dissolve the seam fully — post-inline SROA/scalar-forwarding eliminates the u64 ABI pack/unpack (+ wasm-local mem2reg) → toward LLVM-LTO parity #219 forward-carrier first, then main's inline_functions never inlines small multi-call-site leaves (size<10 OR-term shadows MULTI_CALL_SITE_LIMIT); + optimize CLI lacks whole-function DCE #228 dce-functions); Behavioral differential as an optimization gate (self-certify non-Z3-backstoppable transforms) #238's --differential gate wiring preserved.
test_rse_different_locals — updated: the merged pipeline (carrier-forwarding + main's stronger simplify_locals) now fully folds x=30;y=20;x+y → 50, eliminating both locals. Provably equivalent; over-specified assertion relaxed with a comment.

Validation

cargo test --release: 656 passing, 0 failed.
Behavioral differential gate (--features differential, CI corpus + repro): 4 certified, 0 diverged (httparse, json_lite, state_machine, sem.loom all semantics-PRESERVED). This is the load-bearing check — opt: dissolve the seam fully — post-inline SROA/scalar-forwarding eliminates the u64 ABI pack/unpack (+ wasm-local mem2reg) → toward LLVM-LTO parity #219's carrier-forward is not Z3-verifiable, so behavioral equivalence is the safety net.
fmt clean; clippy runs in CI.

🤖 Generated with Claude Code

Faithful local repro of gale #219's C<->Rust decide seam: `decide` returns {action, new_count} packed into an i64; the shim unpacks them. After `--passes inline` the `run` body contains the pack (extend_i32_u/shl/or) → i64 carrier local → unpack (and/shr_u/wrap_i64) round-trip on one value — exactly the residue to dissolve. Verified: optimizes + validates today; inline leaves the round-trip intact (confirmed via `loom optimize --passes inline` + wasm-tools print). The seam-SROA pass (next) must forward the scalar components and drop the carrier, yielding `(action & 0xff) + new_count` — each rewrite Z3-proven. Refs #219

gale attached the authentic post-loom-inline dissolved z_impl_k_sem_give on the issue. Decoded + verified: sha256 f81da42d… (matches), 5254 B, valid wasm. It contains the u64 pack/unpack round-trip to dissolve — pack (extend_i32_u/shl/or) at body lines 59-63, unpack (and/shr_u/wrap_i64) at 90/104, with a dead i64 carrier. Grounds the seam-SROA pass against the real body, not just the synthetic fixture. Design confirmed on #219 (proof-carrying: Z3 obligation per ISLE rewrite; ISLE-term route preferred). Implementation (mem2reg + pack/unpack forwarding + DCE) follows. Refs #219

…tend_i32_u) Lays the ISLE-layer foundation for the #219 pack/unpack forwarding rules: - types.isle: I32WrapI64 / I64ExtendI32U ValueData variants. - constructors.isle: decl + extern constructor + extern extractor for each. - loom-shared/src/lib.rs: i32_wrap_i64_extract / i64_extend_i32_u_extract (mirror iadd32_extract, single-field). - build.rs: register seam_sroa.isle AND add the two extractors to the fix_extractor_ownership list (cranelift-isle 0.132.1 emits extractor args as owned `Value`; the post-processor rewrites them to `&Value` — without this the generated trie move-then-borrows arg0 and loom-shared won't compile). - seam_sroa.isle: the unconditional, Z3-trivial rule `wrap_i64(extend_i32_u(x)) → x`. loom-shared builds clean. IMPORTANT (next step): the generated ISLE `simplify` is NOT in loom's live rewrite path — build.rs documents this, and the active rewriters are the Rust `rewrite_pure_impl` / `rewrite_with_dataflow` (loom-shared/src/lib.rs), validated by the Z3 translation validator per pass. So this rule does not fire yet; the LIVE seam-SROA rewrite must be added as a `rewrite_pure_impl` match arm (ValueData::I32WrapI64{val} where val is I64ExtendI32U{inner} → inner), with the Z3 verifier discharging the obligation. ISLE terms retained as proof-layer documentation of the identity. Refs #219

The actual rewrite (the ISLE rule committed earlier is in loom's dead simplify path; the live rewriters are the Rust rewrite_pure_impl/rewrite_with_dataflow, validated by the Z3 translation validator per pass). Adds rewrite_pure_impl match arms for the two #219 conversion terms: - I32WrapI64: recurse operand; if it is I64ExtendI32U(inner) → inner (unconditionally sound: extract(31,0, zero_ext(32,x)) = x — the u64-ABI round-trip a packed scalar pays at the dissolved decide seam); also folds wrap_i64(i64.const N) → i32.const (low 32). - I64ExtendI32U: recurse operand; folds extend_i32_u(i32.const N) → i64.const (zero-extended). Test: test_seam_sroa_wrap_extend_identity — wrap_i64(extend_i32_u(local.get 0)) dissolves to local.get 0. Full `cargo test --release --lib --test-threads=4` green, 0 failures. This is the foundation case. The sem repro's full dissolution additionally needs the carrier mem2reg + the masked and/shr_u forwarding rules (steps 2-3), since its unpack uses and/shr_u (not wrap) on the packed u64. Refs #219

…lves) Dissolves the u64-pack unpack arithmetic, in rewrite_pure_impl (Z3-validated per pass; pure-integer so the verifier actually verifies — no skip). Each rule is unconditionally sound with only constant side-conditions (no range analysis): I64And (the `& mask` unpack): - (shl Z k) & M → 0 when M's set bits are all below k (M>>k==0, 0<k<64) - (or A B) & M → survivor & M when one OR operand is such a cleared shift I64ShrU (the `>> k` unpack): - (shr_u (shl Z k) k) → Z & (u64::MAX>>k) (shl-then-shr-same masks low 64-k) - (shr_u (or P Q) k) → (P>>k)|(Q>>k) when an operand is (shl _ k) (logical shift distributes over OR; targeted to the pack shape — no general bloat; recursion collapses the shl side via the rule above) Together these reduce a forwarded pack `(or (shl (extend_u hi) 32) lo)`: & 0xff → lo & 0xff (low field) >> 32 → (extend_u hi) & 0xffffffff (high field; lo const shifts out) Tests: test_seam_sroa_mask_clears_shifted_pack, test_seam_sroa_shr_extracts_ high_field (+ the step-1 wrap/extend test). Full `cargo test --release --lib --test-threads=4` green (399/28/17, 0 failures). These fire once inlining delivers pack/unpack adjacency (the inline-modelability extension for control-flow + trap-call decides is the remaining prerequisite, tracked separately — higher blast radius, needs gale silicon validation). Refs #219

…ier) Design for the #219 inline prerequisite: wire in precise br_table/br_if/br symbolic execution (the deferred ExecutionState/merge_states scaffolding) so the verifier can soundly prove inline-equivalence of br_table-dispatching decides. Approach: state-merging-with-ITE via a label/continuation stack (NOT path forking), exact for the acyclic single-exit case. Grounded in Alive2 (PLDI'21, the production SMT bounded-TV precedent), Kuznetsov state-merging (PLDI'12), WASP (ECOOP'22 wasm br_table rules), Boogie block-predicate WP. Includes the soundness argument, the explicit pitfalls (selector constraint, fall-through join, ⊥-not- havoc for trap arms, branch arity, over-approx-and-skip), 2-phase staging (sem_give integer first, then memory for mutex/pipe), and the adversarial verify-or-revert + silicon test gate. For gale review before implementation. Refs #219

…tection) First component of the PR-C precise-CF inline-verifier extension (design approved by gale on #219, 2026-06-17). `is_noreturn_callee(func)`: conservative-sound — true iff the body has no Return and no Br/BrIf/BrTable anywhere (recursing) and ends in Unreachable. Admits exactly the core::panicking helpers (panic_fmt → unreachable; panic_const_add_overflow → call panic_fmt; unreachable) and rejects any normal-returning or branching function. The by-body inline modeler (PR-C M3, next) will use this to encode a `call` to such a callee as ⊥ "diverge if reached" — never havoc — so a trapping branch constrains only reachability, never the post-state (Alive2 `ub` discipline; avoids the #155/#159 havoc-vs-concrete false-positive surface). Test: test_is_noreturn_callee (bare trap / trap-after-call / normal-return / no-trailing-trap / branch-escape). #[allow(dead_code)] until the M3 gate wires it. Refs #219

The soundness-critical core of the inline-modelability extension: a precise symbolic executor for structured, acyclic wasm control flow (block / if / br / br_if / br_table) by state-merging-with-ITE over a label/join stack — never path-forking. For a bounded acyclic region this is EXACT (a passing SMT check is a real proof). Precedent: Alive2 (PLDI'21), Kuznetsov state-merging (PLDI'12), WASP br_table rules (ECOOP'22). See docs/design/pr-c-precise-cf-verification.md. Built as a NEW ISOLATED function, not by mutating the shared encoders (encode_block_body / encode_function_to_smt_impl_inner stay untouched -> zero regression risk to existing passes). encode_simple_instruction's catch-all HAVOCS (pushes a fresh BV without popping operands), so delegating an unmodeled op would desync the stack; instead an allowlist gate (is_acyclic_modelable_body) front-loads soundness and the executor runs only on admitted functions. Over-approximate-and-skip everything else: loops (back-edge), calls, memory, unknown opcodes, multi-result/param blocks. br_table default guard is UNSIGNED bvuge(sel, n) — a signed >= would leave selectors in [2^31, 2^32) in a partition gap. All #[cfg(feature="verification")] + #[allow(dead_code)] until M3 wires it into inline_functions' verify-or-revert. Tests (5, green): br_table switch proves equivalent to the if/else chain; identity; detects wrong-default and swapped-arm counterexamples; gate rejects loops/memory, admits the switch. Also folds in cargo-fmt drift in loom-shared from #219 SROA @4356441. Refs #219 Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…e acyclic executor Extends exec_acyclic to model a `call F` BY BODY (the inline-soundness principle from #155 extended to control flow): pop args, recursively encode the callee through exec_acyclic with params bound to the args, declared locals zero-init, globals SHARED with the caller (callee global writes propagate back); push the merged single result and narrow the caller path to the callee's returning paths. A call to a no-return callee (is_noreturn_callee: panic -> unreachable) sets reachable=false — the path DIVERGES (drops out of the result ITE), NEVER havoc (Alive2 ub discipline; havoc reopens the #155/#159 false-positive surface). Threads VerificationSignatureContext (function_bodies, the existing #151 callee lookup) + a recursion depth into exec_acyclic / encode_acyclic_function / is_acyclic_modelable_body. MAX_ACYCLIC_CALL_DEPTH=8 bounds by-body inlining and makes a cyclic/recursive call graph fail safely (reject, never infinite-unroll). The gate now admits Call(g) iff g is a local function that is no-return or recursively acyclic-modelable; imports/call_indirect/over-deep stay rejected. exec_acyclic remains the soundness enforcer (Err, never havoc, on anything unmodelable). Still #[cfg(verification)] + #[allow(dead_code)] until M3 wires it into inline_functions' verify-or-revert. Tests (7 total, green): + call-by-body of a br_table callee proves equivalent to its inlined switch (the seam capability); + no-return call diverges not havoc. Refs #219 Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Adds the equivalence harness over the precise acyclic CF executor: encodes both functions via encode_acyclic_function (deterministic input names unify in Z3) and checks a == b has no counterexample. Mirrors verify_function_equivalence's soundness boundaries (no width-matching the final equality; Unknown -> Err). Returns Err whenever either side is not acyclic-modelable, a width mismatches, or the encoding fails — so the caller (inline_functions' verify-or-revert, wired in M3.2) FALLS BACK to the existing encoder. This only ADDS proving power for the br_table-callee inline case; it never removes a check. Void functions defer to the fallback (no side-effect equivalence modeled yet). Still #[allow(dead_code)] until M3.2 wires it into inline_functions. Tests (9 total, green): + test_verify_acyclic_equivalence_inline — a caller that calls a br_table decide verifies equivalent to the inlined switch, a wrong-default body does NOT verify, and a loop function Errs (caller falls back). Refs #219 Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…liner TranslationValidator::verify() now tries verify_acyclic_equivalence for the inline pass before the existing encoder. The main encoder models br_table approximately and a br_table callee is not straight-line-by-body-modelable, so it cannot prove `call F` (br_table callee) equal to its inlined body → the inline reverts. The acyclic executor is EXACT for acyclic CF + by-body calls, so it can. PURELY ADDITIVE + scoped to pass_name == "inline_functions": ONLY a precise Ok(true) short-circuits to accept; Ok(false)/Err/panic FALL THROUGH to the existing verify_function_equivalence_with_context unchanged (acyclic call wrapped in its own catch_unwind). This can only ADD accepts (the br_table-callee inlines), never weaken an existing check; every non-inline pass is byte-identical. This also makes the M1a/M1b/M3.1 executor chain live (the #[allow(dead_code)] markers are now harmless no-ops). No br_table callee is admitted as an inline CANDIDATE yet (function_inline_modelable is still straight-line — M3.3), so this is a behavioral no-op on the current corpus: full `cargo test --release --lib` is 408 passed / 0 failed / 2 ignored, unchanged. clippy + fmt clean. The capability is proven by the 9 acyclic unit tests; M3.3 enables it end-to-end on the seam. Refs #219 Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…+ Unreachable trap function_inline_modelable now admits a callee under EITHER regime A (straight-line incl. memory — the existing by-body modeler, #155/#157/#161) OR regime B (acyclic control flow + pure-integer ops + by-body calls to no-return/recursively-modelable local callees — the precise acyclic executor, PR-C). Mirrors verify::is_acyclic_modelable_body so the candidate gate admits exactly what the acyclic verifier can prove (admitting an unprovable callee → caller reverts, the #159 collateral problem). Threads module context (functions + num_imported_funcs) into the gate + a recursion-depth bound; a br_table+memory callee passes neither regime → stays opaque until Phase 2. Also handles Unreachable as a trap (⊥) in exec_acyclic and both gates: the path diverges (reachable=false, drops out of the result ITE), never havoc — the seam decides end their overflow arms in `call panic_const; unreachable`, so this is required to model them at all. Full `cargo test --release --lib` = 409 passed / 0 failed / 2 ignored (+1: test_acyclic_unreachable_diverges). fmt + clippy clean. SCOPE NOTE (found by inspecting the real repro-219/sem.loom.wasm): decide (gale_k_sem_give_decide) IS regime-B-eligible (acyclic br_table + i64 pack + no-return panic call, no memory), so it is now an inline candidate. BUT its caller z_impl_k_sem_give READS MEMORY (i32.load offset=8/12 for decide's args) and calls imports, so proving decide's inline INTO z_impl needs the acyclic executor to model z_impl's memory = PHASE 2. So sem_give's seam will not dissolve until Phase 2 (memory Array threaded through exec_acyclic). M3.1-M3.3 are correct + necessary infra; Phase 2 is the next real enabler for sem_give. Refs #219 Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…ecutor The acyclic executor now models linear memory (loads/stores), required because real seam callers (z_impl_k_sem_give) read memory for the decide's args. AcyclicState gains `memory: Array`. Memory is MUTABLE branch state, so it is ITE-merged at every join exactly like locals/globals: BranchEntry/MergedState carry the Array, acyclic_merge_entries ITE-merges it (Bool::ite over the Array sort), acyclic_push_branch records it, acyclic_apply_join sets it, and the If arm saves/restores it for the else branch. exec_acyclic handles I32/I64 load/store + partial-width load8/16/store8/16 EXPLICITLY via the shared encode_i32_load_from / encode_i64_load_from / encode_*_store_into / encode_partial_* helpers (the same ones the main encoder + by-body modeler use -> bit-identical memory expressions); they are NEVER delegated to encode_simple_instruction (which havocs). memory is threaded through encode_acyclic_call (shared + propagated like globals) and initialized in encode_acyclic_function as Array::new_const("memory", ...) -- same name as the main encoder so both encodings unify. Gates: new is_acyclic_memory_op (verify.rs) admitted by is_acyclic_modelable_body; lib-side is_acyclic_memory_modelable_instr admitted by the regime-B inline gate. Call gate stays STRICT (no impure-call havoc yet -- that is Phase 2b, the soundness-critical part). So sem_give's z_impl (which also calls k_spin_lock import + z_unpend) is not yet fully modelable; 2a is the contained, testable foundation. Full `cargo test --release --lib` = 410 passed / 0 failed / 2 ignored. New test test_acyclic_memory_merge: store-on-each-branch then load proves equivalent to the direct if/else (validates the per-join memory ITE-merge feeds the loaded value). test_acyclic_modelable_gate updated (memory now admitted). fmt + clippy clean. Refs #219 Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

The acyclic executor now models impure calls (imports + non-acyclic-modelable local callees) by HAVOC, so memory-bearing seam callers like z_impl_k_sem_give (which call k_spin_lock + z_unpend) become verifier-modelable end-to-end. exec_acyclic's Call arm is now 3-tier (mirrors the main encoder): no-return -> diverge; acyclic single-result local -> by-body (encode_acyclic_call); else (import / multi-result / too-deep / loop-containing) -> acyclic_havoc_call: pop args, push fresh `call_{g}_result_{i}`, havoc all globals + memory with a SHARED per-encode counter. The counter is threaded (&mut u64) through exec_acyclic + encode_acyclic_call and reset per encode_acyclic_function -- so the Nth impure call gets the SAME havoc id in the original and optimized encodings -> they unify in Z3. Havoc is conservative-sound: it never forges a value equal to a concrete one, so a havoc-vs-concrete mismatch refuses to prove (the #155/#159 guard). is_acyclic_modelable_body is now purely STRUCTURAL (drops ctx/depth params -- any direct call is modelable since havoc covers the opaque case; clippy only_used_in_recursion). The lib candidate gate is unchanged (decide already admitted via its no-return panic call). Full `cargo test --release --lib` = 413 passed / 0 failed / 2 ignored. New tests: test_acyclic_havoc_call_unifies (havoc id parity); test_acyclic_havoc_not_concrete (havoc'd result NOT provably equal to a concrete value -- soundness); test_acyclic_havoc_memory (load after impure call reads havoc'd memory, not the initial -- soundness). fmt + clippy clean. Next: e2e dissolve on repro-219/sem.loom.wasm (decide's inline into z_impl should now be PROVEN by the acyclic path -> no `call $gale_k_sem_give_decide`). Refs #219 Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…ow (dormant) Foundation for the #219 perf milestone (dissolve the u64 ABI pack/unpack the proven inline leaves in z_impl). Extends rewrite_with_dataflow to forward a SINGLE-ASSIGNMENT local's pure defining expression to its uses (carrier mem2reg), gated by a caller-supplied `OptimizationEnv.single_assign` set -- EMPTY by default, so this is DORMANT and behavior is byte-identical to before (confirmed: full loom-core lib suite 413 passed / 0 failed; loom-shared clippy clean). Soundness of the mechanism (the loom-core pass that seeds single_assign lands next, and only seeds TOP-LEVEL single-assignment locals so the def dominates all uses): - is_forwardable_expr: pure-op whitelist (const/local/global/int arith-bitwise-shift-rotate-compare/converts/select); calls, loads/stores, div/rem, floats -> not forwardable. - reaching-defs guard: invalidate_pins_referencing drops a pinned forwarding whenever an input local is reassigned (in-traversal for straight-line/Block/ Loop; the If arm intersects both forks' pins so a branch that reassigns an input drops the pin). So a forwarded expression always carries its def-time inputs. - pinned survives CF clear() (sound for single-assignment: one value on every path reaching a use). This transform is NOT Z3-backstoppable for the seam (void z_impl + value flows only into a havoc'd impure-call arg -> vacuous pass); gale's G474RE silicon is the behavioral gate (per #219 -- gale will flash inline-only vs inline+teardown). Refs #219 Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…mechanism wired) Activates the #219 carrier scalar-forwarding via the dormant mechanism (@186d236): new optimize pass forward_carrier_locals + `forward-carrier` CLI pass (after inline) + a call in optimize_module Phase 1b. Per function it forwards locals with exactly one write whose single write is at the FUNCTION TOP LEVEL (def dominates all uses), then rewrite_pure applies the SROA rules; verify_or_revert guards (gale silicon is the behavioral gate for the void seam). Full lib suite 413/0; fmt + clippy clean. KNOWN-CONSERVATIVE (does not yet dissolve the sem_give seam): the seam carrier (local.tee 3 in inlined z_impl) is NESTED in `block @2`, not top level, so the top-level proxy excludes it. Verified by hand it IS sound to forward -- the use `local.get 3` is reached only via `br_if @2` positioned AFTER the tee, so the def dominates the use (overflow arm traps via `unreachable`, never reaching it), and inputs local4/local7 are set only before the def. Admitting it needs a real TRAP-AWARE STRUCTURED DOMINANCE check to replace the top-level proxy (next); then the dual silicon flash (inline-only vs inline+teardown) validates. Refs #219 Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…te generalized) Replaces the conservative top-level-only proxy in forward_carrier_locals with carrier_def_dominates_uses: a single-assignment local is forwarded iff its def DOMINATES all uses by a sound structured-CFG argument (def not under If/Loop; all uses textually after it; no pre-def Br/BrIf/BrTable escapes the def's enclosing blocks; returns/unreachable before the def are safe). Admits the nested-but- dominating seam carrier the top-level proxy excluded. Soundness over completeness (a wrong `true` = silent miscompile). Full lib suite 414/0 incl. test_carrier_ dominance (straight-line, use-before-def, def-in-If, nested-dominating, escaping-br). fmt + clippy clean. NOT YET dissolving the e2e seam: inline+forward-carrier leaves z_impl BYTE-IDENTICAL to inline-only (same hash) -- the pass makes no NET change to z_impl. The dominance gate is no longer the blocker; the pass is either finding single_assign empty for z_impl OR forwarding-then-reverting (revert restores the identical inline body) OR the SROA rules not firing on forwarded terms (instructions_to_terms DOES handle z_impl's Call/load/br_table, so not a conversion skip). Next: instrument forward_carrier_locals (single_assign contents / changed? / reverted?) to pinpoint. Refs #219 Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…able-safe seam teardown) The term IR round-trip is lossy for control flow (br_if/br_table are pushed as value terms), so term-based carrier forwarding turns a br_table function like z_impl stack-invalid. Replace it with an instruction-level transform: - simplify_pure_windows: apply rewrite_pure only to maximal pure straight-line windows (which round-trip faithfully), never across branches. - extract_carrier_rhs + forward_carrier_in_body: forward a single-assignment dominating carrier's pure RHS to its uses, then drop the dead def (tee keeps the value on the stack; set drops set+RHS). - never-pessimize size guard: revert any forwarding that does not shrink the fn. - rewrite_pure: extend_i32_u(x) & M -> extend_i32_u(x) when M's low 32 bits are all set (Z3-sound — a zero-extend already fits in 32 bits). On the sem seam the &0xff branch fully dissolves and the shl/tee vanish; a status>>32 residual remains pending a narrow-local value analysis. Sound, validates, and net-shrinks (verify_or_revert + size guard). gale G474RE silicon is the behavioral gate. Refs: #219 Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

The forwarded unpack (extend(a)<<32 | status) >> 32 distributes (in rewrite_pure) to extend(a) | (status>>32) but can't finish on its own: status>>32 -> 0 needs the value-range fact status < 2^32, which context-free SROA can't see. Add a narrow-local analysis (loom-core mod optimize): - def_value_is_narrow / narrow_locals: a local is narrow (< 2^32, high 32 bits always 0) when EVERY def stores i64.extend_i32_u(_), an i64.const with the high 32 bits clear, or (_ & i64.const c) with c < 2^32. - fold_narrow_high_shr: peephole (shr_u (local.get N) k) -> i64.const 0 when N is narrow and (k mod 64) >= 32. - forward_carrier_locals iterates fold+windows to a fixpoint: simplify_pure_windows distributes the unpack and EXPOSES the bare status>>32, the next fold zeroes it, the next window collapses (or extend 0) -> extend. z_impl now FULLY dissolves: the store site becomes local.get 0; local.get a; i64.extend_i32_u; i64.store32 offset=8 storing the thread pointer directly. shl / tee / or / shr_u and the redundant mask are all gone. 416 lib tests green, e2e validates. Refs: #219 Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

# Conflicts: # loom-cli/src/main.rs # loom-core/src/lib.rs # loom-shared/src/lib.rs

…fold The origin/main merge combines #219's optimize_module phases (forward_carrier_locals) with main's stronger simplify_locals. Together they now fold `local x=30; local y=20; return x+y` all the way to `[I32Const(50)]` (both locals eliminated, add constant-folded) instead of leaving one surviving `$y` store. The old assertion expected exactly 1 store; the new fully-folded form is strictly more aggressive and provably equivalent (30 + 20 == 50). The behavioral differential gate certifies semantics PRESERVED on this module and on the whole CI corpus (httparse/json_lite/state_machine/ sem.loom: 4 certified, 0 diverged). Not a miscompile — a stale, over- specified intermediate-form expectation invalidated by the merge. Refs #219 Refs #238 Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

avrabe and others added 21 commits June 15, 2026 03:10

Merge remote-tracking branch 'origin/main' into feat/219-seam-sroa

4f1ead2

# Conflicts: # loom-cli/src/main.rs # loom-core/src/lib.rs # loom-shared/src/lib.rs

avrabe merged commit 7f5d8b4 into main Jul 1, 2026
23 of 25 checks passed

avrabe deleted the feat/219-seam-sroa branch July 1, 2026 22:12

avrabe mentioned this pull request Jul 1, 2026

opt: dissolve the seam fully — post-inline SROA/scalar-forwarding eliminates the u64 ABI pack/unpack (+ wasm-local mem2reg) → toward LLVM-LTO parity #219

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

opt(#219): dissolve the inlined seam — windowed SROA + carrier-forwarding + narrow-local#252

opt(#219): dissolve the inlined seam — windowed SROA + carrier-forwarding + narrow-local#252
avrabe merged 21 commits into
mainfrom
feat/219-seam-sroa

avrabe commented Jul 1, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

avrabe commented Jul 1, 2026

What it does

Brought up to date with main (merge, one resolution pass)

Validation

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant