Skip to content

fix(selector): i64 rotl/rotr/div_u/rem_u compute real results — never silent 0 (#610)#613

Merged
avrabe merged 1 commit into
mainfrom
fix/610-i64-rot-div
Jul 3, 2026
Merged

fix(selector): i64 rotl/rotr/div_u/rem_u compute real results — never silent 0 (#610)#613
avrabe merged 1 commit into
mainfrom
fix/610-i64-rot-div

Conversation

@avrabe

@avrabe avrabe commented Jul 3, 2026

Copy link
Copy Markdown
Contributor

Per-op root cause + verdict (all four: REAL FIX, no loud-rejects needed)

Filed by the challenge harness (#610): i64.rotl/rotr/div_u/rem_u compiled without error on the Cortex-M path and returned 0 for every input (rotl by 0 — the identity — returned 0). One disease in the Thumb-2 encoder's expansions, two forms:

op root cause verdict
i64.rotl / i64.rotr expansion used hardcoded R3/R4 scratch colliding with selector-assigned regs, then its own POP {R4} restored saved scratch over the computed result (rd_lo == R4 in the repro) → returns the caller's stale R4 = 0 under qemu reset state implemented — core rewritten to fixed regs (R0:R1 value, R2 amount, R3+R12 scratch) inside the new fixed-ABI wrapper
i64.div_u / i64.rem_u expansion ignored its register fields outright (rdlo: _, …; hardcoded R0:R1 / R2:R3 in, result to R0:R1) while the selector allocated rd = R4:R5, which the core's own POP {R4-R7} then clobbered implemented — shift-subtract cores byte-identical, wrapped in the fixed-ABI marshal/restore
i64.div_s / i64.rem_s (same disease, fixed together) identical fixed-ABI mismatch implemented

The fixed-ABI wrapper: save R0-R3 → marshal operand regs into the core's fixed inputs via the stack (permutation-safe: every source read before any fixed reg is written) → run the core (self-preserving for R4+; R12 is encoder scratch, never allocatable, #212) → MOV the result pair into the selector's rd (loud Err on the impossible swapped pair — the #554 honesty floor) → restore R0-R3, skipping registers the result occupies. Both codegen paths benefit (direct/--relocatable and optimized — both pass real registers).

Bonus per WASM semantics: divide-by-zero now traps (ORRS R12,R2,R3; BNE +0; UDF #0, matching the i32 guard) — previously div/0 silently returned 0.

Red→green

scripts/repro/i64_rot_div_610_differential.py — 55 vectors vs wasmtime under unicorn (-t cortex-m3 -n <fn> --relocatable, the issue's exact config): rot-by-0 identity, rot 32/63/≥64 (mod-64), _hi twins checking the upper result half, div by 1/self/0 (both sides trap), high-bit patterns, >32-bit divisors, signed variants, shl4 control.

  • v0.30.0 (pre-fix): 40/55 MISMATCH (every rot/div/rem vector wrong — issue rows reproduced exactly: rotl(1,8)=0 want 256, div_u(100,7)=0 want 14, …; controls OK)
  • post-fix: 55/55 OK, exit 0

Wired as an isolated CI oracle job (i64-rot-div-610-oracle), same shape as the #503/#587 job.

Gates

Closes #610

🤖 Generated with Claude Code

… silent 0 (#610)

All four ops (plus div_s/rem_s, same disease) compiled without error and
returned 0 for every input on the ARM Cortex-M path. Root cause was in the
Thumb-2 encoder's multi-instruction expansions, one disease in two forms:

* I64Rotl/I64Rotr: the expansion used hardcoded R3/R4 scratch that collided
  with selector-assigned registers, then its own `POP {R4}` restored the
  saved scratch OVER the computed result (rd_lo == R4 in the repro) — the op
  returned the caller's stale R4: 0 under qemu/unicorn reset state.
* I64DivU/I64RemU/I64DivS/I64RemS: the expansion IGNORED its register fields
  outright (`rdlo: _, ...` — hardcoded R0:R1 dividend, R2:R3 divisor, result
  to R0:R1) while the selector allocated rd elsewhere (R4:R5), which the
  core's own POP then clobbered with stale values.

Fix: a fixed-ABI wrapper around each core — save R0-R3, marshal the operand
registers into the core's fixed input regs via the stack (permutation-safe:
every source is read before any fixed reg is written), run the core
(self-preserving for R4+; R12 is encoder scratch, never allocatable #212),
MOV the result pair into the selector's rd (loud Err on the impossible
swapped pair), restore R0-R3 skipping the result registers. The rot cores are
rewritten to fixed regs (R0:R1 value, R2 amount, R3+R12 scratch); the div/rem
shift-subtract cores are byte-identical inside the wrapper. Divide-by-zero
now traps (`ORRS R12,R2,R3; BNE +0; UDF #0`), matching WASM semantics and the
i32 guard — previously div/0 silently returned 0.

Estimator kept in exact agreement (#498/#511 oracle): rot 74→102 bytes,
div_u/rem_u/div_s/rem_s 74/78/126/124 → 120/124/172/170; all sizes are
register-independent by construction. Frozen fixture hashes bit-identical
(these ops appear in no frozen anchor).

Red→green: scripts/repro/i64_rot_div_610_differential.py (55 vectors — rot
identity/32/63/>=64 + hi-half twins, div by 1/self/0-trap, high-bit patterns,
signed variants, shl control) vs wasmtime under unicorn: 40/55 MISMATCH on
v0.30.0, 55/55 OK after. Wired as an isolated CI oracle job. New encoder unit
tests pin the rd-landing tail, the zero-divisor guard, the rd∈R0-R3
skip-restore, and the swapped-pair loud reject.

Closes #610

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@codecov

codecov Bot commented Jul 3, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 98.84170% with 3 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
crates/synth-backend/src/arm_encoder.rs 98.81% 3 Missing ⚠️

📢 Thoughts on this report? Let us know!

@avrabe avrabe merged commit 76d99a6 into main Jul 3, 2026
29 checks passed
@avrabe avrabe deleted the fix/610-i64-rot-div branch July 3, 2026 17:10
avrabe added a commit that referenced this pull request Jul 3, 2026
Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

1 participant