Skip to content

avoid unaligned aliasing load in neon vint4 byte constructor#652

Merged
solidpixel merged 2 commits into
ARM-software:mainfrom
sahvx655-wq:neon-vint4-byte-load
Jun 26, 2026
Merged

avoid unaligned aliasing load in neon vint4 byte constructor#652
solidpixel merged 2 commits into
ARM-software:mainfrom
sahvx655-wq:neon-vint4-byte-load

Conversation

@sahvx655-wq

Copy link
Copy Markdown
Contributor

The NEON vint4(const uint8_t*) constructor reads its four bytes with vld1_dup_u32(reinterpret_cast<const uint32_t*>(p)). p is a plain byte pointer with no four-byte alignment guarantee, so the cast both performs a potentially unaligned load and reads the uint8_t storage through a uint32_t lvalue, which breaks strict aliasing. SuiteVint4.UnalignedLoad8 already constructs from u8_data + 1 (an odd address), and the same constructor runs for every texel during encode via load_texel_u8, so the path is well exercised.

This is the defect #650 removed from the SSE and AVX2 backends by copying through a scalar with memcpy before broadcasting; NEON was left on the old reinterpret-cast path, and an x86 UBSAN run does not flag it because the load sits inside an intrinsic. I have applied the same memcpy form the load() helper in this header already uses, so the bytes are read alignment- and aliasing-safely before the existing vmovl widening. Results are bit-identical for aligned data and the full NEON unit suite stays green; keeping the load in the library helper avoids asking callers to pre-align.

@solidpixel solidpixel self-requested a review June 26, 2026 12:14
@solidpixel solidpixel self-assigned this Jun 26, 2026
@solidpixel

Copy link
Copy Markdown
Contributor

PR looks fine, but I want to merge #653 first so we have some unit test support in place to catch the issues automatically. Currently UBSAN and ASAN rely on manual test runs.

@solidpixel

solidpixel commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

Confirmed that the new tests in #653 would catch the issue (only on GCC - Clang doesn't complain). Will merge this first, rebase the other PR on top, and new tests should start passing.

@solidpixel solidpixel merged commit 6d99597 into ARM-software:main Jun 26, 2026
9 checks passed
@sahvx655-wq

Copy link
Copy Markdown
Contributor Author

Makes sense to land #653 first. The Clang gap lines up with what I saw when I dug into this: the unaligned load sits inside the vld1 intrinsic, so -fsanitize=alignment never fires on it, which is why the x86 UBSAN CI stayed quiet. Good that the GCC run picks it up now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants