Skip to content

swizzle_dyn: unsafe variant where you guarantee that indices are either in-bounds or 0xFF #486

@SuperSamus

Description

@SuperSamus

One of the swizzle variants is defined like this:

#[cfg(target_feature = "ssse3")]
16 => transize(x86::_mm_shuffle_epi8, self, zeroing_idxs(idxs)),

With zeroing_idxs being a function that converts all the out-of-bounds indices to 0xFF.

The problem is: there are situations where the user can guarantee that out-of-bounds indices are always 0xFF, but there is no way to communicate it to swizzle_dyn, which will always waste performance with zeroing_idxs.

Trying to guarantee it with assert_unchecked does nothing, meaning that the only way to prevent the inefficiency is with a new function that doesn't call zeroing_idxs in the first place.

EDIT: Also on avx2_pshufb:

let hihi = avx2_cross_shuffle::<0x11>(bytes.into(), bytes.into());
let hi_shuf = Simd::from(avx2_half_pshufb(
hihi, // duplicate the vector's top half
idxs.into(), // so that using only 4 bits of an index still picks bytes 16-31
));
// A zero-fill during the compose step gives the "all-Neon-like" OOB-is-0 semantics
let compose = idxs.simd_lt(high).select(hi_shuf, Simd::splat(0));
let lolo = avx2_cross_shuffle::<0x00>(bytes.into(), bytes.into());
let lo_shuf = Simd::from(avx2_half_pshufb(lolo, idxs.into()));
// Repeat, then pick indices < 16, overwriting indices 0-15 from previous compose step
let compose = idxs.simd_lt(mid).select(lo_shuf, compose);
compose

The first compose could be completely removed.

Metadata

Metadata

Assignees

No one assigned

    Labels

    C-feature-requestCategory: a feature request, i.e. not implemented / a PR

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions