Compiled with RUSTFLAGS=-Ctarget_feature=-neon (for aarch64-unknown-linux-gnu):
#![feature(simd_ffi)]
use std::arch::aarch64::*;
fn main() {
// The target_feature unsafety contract requires us to test this first.
if std::arch::is_aarch64_feature_detected!("neon") {
unsafe { test(); }
}
}
#[target_feature(enable = "neon")]
unsafe fn test() {
const A: [u32; 4] = [40, 30, 16, 9];
const B: [u32; 4] = [2, 12, 26, 33];
let a: uint32x4_t = vld1q_u32(A.as_ptr());
let b: uint32x4_t = vld1q_u32(B.as_ptr());
let r = trampoline(a, b);
println!("{a:?} + {b:?} -> {r:?}");
}
fn trampoline(a: uint32x4_t, b: uint32x4_t) -> uint32x4_t {
unsafe { add(a, b) }
}
extern "C" {
// The C implementation is a simple pass-through to `vaddq_u32(a, b)`.
fn add(a: uint32x4_t, b: uint32x4_t) -> uint32x4_t;
}
Ideally, trampoline would fail to compile, because it does not have Neon and shouldn't be able to represent the vector types.
- The call to
trampoline(a, b) passes the arguments in memory (using the Rust ABI).
- The subsequent call to
add(a, b) tries to pass each argument in four w registers (each holding a u32), as if they are tuples (u32, u32, u32, u32).
- The C implementation expects arguments in Neon registers (
v0 and v1), so the result is unpredictable.
If test() — which has "neon" enabled — calls add(a, b) directly, it uses v0 and v1, as per AAPCS64.
This is the AArch64 counterpart to #116344 and #114479, with the twist that on AArch64, it's preferable for Neon-specific types to fail to compile without the proper features. These aren't general-purpose types. At least some C compilers refuse to compile code that uses Neon types when -mcpu=+nosimd+nofp is specified.
Meta
This came out of a Zulip discussion.
rustc --version --verbose:
rustc 1.76.0-nightly (a1a37735c 2023-11-23)
binary: rustc
commit-hash: a1a37735cbc3db359d0b24ba9085c9fcbe1bc274
commit-date: 2023-11-23
host: x86_64-unknown-linux-gnu
release: 1.76.0-nightly
LLVM version: 17.0.5
Compiled with
RUSTFLAGS=-Ctarget_feature=-neon(foraarch64-unknown-linux-gnu):Ideally,
trampolinewould fail to compile, because it does not have Neon and shouldn't be able to represent the vector types.trampoline(a, b)passes the arguments in memory (using the Rust ABI).add(a, b)tries to pass each argument in fourwregisters (each holding au32), as if they are tuples(u32, u32, u32, u32).v0andv1), so the result is unpredictable.If
test()— which has "neon" enabled — callsadd(a, b)directly, it usesv0andv1, as per AAPCS64.This is the AArch64 counterpart to #116344 and #114479, with the twist that on AArch64, it's preferable for Neon-specific types to fail to compile without the proper features. These aren't general-purpose types. At least some C compilers refuse to compile code that uses Neon types when
-mcpu=+nosimd+nofpis specified.Meta
This came out of a Zulip discussion.
rustc --version --verbose: