One of the swizzle variants is defined like this:
|
#[cfg(target_feature = "ssse3")] |
|
16 => transize(x86::_mm_shuffle_epi8, self, zeroing_idxs(idxs)), |
With zeroing_idxs being a function that converts all the out-of-bounds indices to 0xFF.
The problem is: there are situations where the user can guarantee that out-of-bounds indices are always 0xFF, but there is no way to communicate it to swizzle_dyn, which will always waste performance with zeroing_idxs.
Trying to guarantee it with assert_unchecked does nothing, meaning that the only way to prevent the inefficiency is with a new function that doesn't call zeroing_idxs in the first place.
EDIT: Also on avx2_pshufb:
|
let hihi = avx2_cross_shuffle::<0x11>(bytes.into(), bytes.into()); |
|
let hi_shuf = Simd::from(avx2_half_pshufb( |
|
hihi, // duplicate the vector's top half |
|
idxs.into(), // so that using only 4 bits of an index still picks bytes 16-31 |
|
)); |
|
// A zero-fill during the compose step gives the "all-Neon-like" OOB-is-0 semantics |
|
let compose = idxs.simd_lt(high).select(hi_shuf, Simd::splat(0)); |
|
let lolo = avx2_cross_shuffle::<0x00>(bytes.into(), bytes.into()); |
|
let lo_shuf = Simd::from(avx2_half_pshufb(lolo, idxs.into())); |
|
// Repeat, then pick indices < 16, overwriting indices 0-15 from previous compose step |
|
let compose = idxs.simd_lt(mid).select(lo_shuf, compose); |
|
compose |
The first compose could be completely removed.
One of the swizzle variants is defined like this:
portable-simd/crates/core_simd/src/swizzle_dyn.rs
Lines 52 to 53 in 32ba8ed
With
zeroing_idxsbeing a function that converts all the out-of-bounds indices to0xFF.The problem is: there are situations where the user can guarantee that out-of-bounds indices are always 0xFF, but there is no way to communicate it to
swizzle_dyn, which will always waste performance withzeroing_idxs.Trying to guarantee it with
assert_uncheckeddoes nothing, meaning that the only way to prevent the inefficiency is with a new function that doesn't callzeroing_idxsin the first place.EDIT: Also on
avx2_pshufb:portable-simd/crates/core_simd/src/swizzle_dyn.rs
Lines 159 to 170 in 32ba8ed
The first
composecould be completely removed.