avoid unaligned aliasing load in neon vint4 byte constructor#652
Merged
Conversation
Contributor
|
PR looks fine, but I want to merge #653 first so we have some unit test support in place to catch the issues automatically. Currently UBSAN and ASAN rely on manual test runs. |
Contributor
|
Confirmed that the new tests in #653 would catch the issue (only on GCC - Clang doesn't complain). Will merge this first, rebase the other PR on top, and new tests should start passing. |
solidpixel
approved these changes
Jun 26, 2026
Contributor
Author
|
Makes sense to land #653 first. The Clang gap lines up with what I saw when I dug into this: the unaligned load sits inside the vld1 intrinsic, so -fsanitize=alignment never fires on it, which is why the x86 UBSAN CI stayed quiet. Good that the GCC run picks it up now. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The NEON vint4(const uint8_t*) constructor reads its four bytes with vld1_dup_u32(reinterpret_cast<const uint32_t*>(p)). p is a plain byte pointer with no four-byte alignment guarantee, so the cast both performs a potentially unaligned load and reads the uint8_t storage through a uint32_t lvalue, which breaks strict aliasing. SuiteVint4.UnalignedLoad8 already constructs from u8_data + 1 (an odd address), and the same constructor runs for every texel during encode via load_texel_u8, so the path is well exercised.
This is the defect #650 removed from the SSE and AVX2 backends by copying through a scalar with memcpy before broadcasting; NEON was left on the old reinterpret-cast path, and an x86 UBSAN run does not flag it because the load sits inside an intrinsic. I have applied the same memcpy form the load() helper in this header already uses, so the bytes are read alignment- and aliasing-safely before the existing vmovl widening. Results are bit-identical for aligned data and the full NEON unit suite stays green; keeping the load in the library helper avoids asking callers to pre-align.