Add int8/int16 reduction fallbacks for SSE2 and AVX by zsoerenm · Pull Request #65 · aff3ct/MIPP

zsoerenm · 2026-03-23T12:31:33Z

Summary

The _reduction and _Reduction specializations for int8_t, int16_t, uint8_t, and uint16_t were guarded by #ifdef __SSSE3__ (SSE backend) and #ifdef __AVX2__ (AVX backend), with no #else fallback. This caused a runtime crash (exit(-1)) when hadd() was called on these types from code compiled for baseline SSE2 or vanilla AVX.

This was discovered via aff3ct/aff3ct#216, where bgemmt<int8_t>() calls mipp::hadd<int8_t>() during LDPC matrix operations.

Changes

SSE2 fallback (mipp_impl_SSE.hxx):

Replaces _mm_shuffle_epi8 (SSSE3) with _mm_shufflelo_epi16 / _mm_shufflehi_epi16 (SSE2) for 16-bit pair swaps
Uses _mm_srli_epi16 / _mm_slli_epi16 + OR (SSE2) for adjacent byte swaps

AVX fallback without AVX2 (mipp_impl_AVX.hxx):

Replaces _mm256_permute4x64_epi64, _mm256_shuffle_epi32, _mm256_shuffle_epi8 (all AVX2) with _mm256_shuffle_ps (AVX) for 64/32-bit shuffles
Extracts to 128-bit halves + SSSE3 _mm_shuffle_epi8 for byte-level shuffles

Tests (tests/src/reductions/hadd.cpp):

Removed MIPP_INSTR_VERSION >= 31 guard so int8_t/int16_t hadd tests run on all SSE versions

Test results

ISA	Assertions	Result
SSE2	1200	All passed
AVX (no AVX2)	400	All passed
AVX2 (native)	1200	All passed

Test plan

Built and ran hadd tests for SSE2-only (-msse2 -mno-sse3 -mno-ssse3 ...)
Built and ran hadd tests for AVX-only (-mavx -mno-avx2)
Built and ran hadd tests for native AVX2 (-march=native)
Verified int8_t/int16_t sections now execute on SSE2 (previously skipped)

🤖 Generated with Claude Code

The _reduction and _Reduction specializations for 8-bit and 16-bit types were guarded by #ifdef __SSSE3__ (SSE backend) and #ifdef __AVX2__ (AVX backend), with no #else branch. This caused a runtime crash when hadd() was called on these types from code compiled for baseline SSE2 or vanilla AVX, as the default template calls exit(-1). SSE2 fallback: - Replace _mm_shuffle_epi8 (SSSE3) with _mm_shufflelo/hi_epi16 (SSE2) for 16-bit pair swaps - Use _mm_srli/slli_epi16 + OR (SSE2) for adjacent byte swaps AVX fallback (without AVX2): - Replace _mm256_permute4x64_epi64, _mm256_shuffle_epi32, and _mm256_shuffle_epi8 (all AVX2) with _mm256_shuffle_ps (AVX) for 64/32-bit shuffles, and extract-to-128-bit + SSSE3 _mm_shuffle_epi8 for byte-level shuffles Update hadd tests to run int8_t/int16_t on all ISA versions. Fixes aff3ct/aff3ct#216 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

zsoerenm mentioned this pull request Mar 23, 2026

bgemmt crashes: mipp::_Reduction::apply<int8_t> undefined for x86 aff3ct/aff3ct#216

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add int8/int16 reduction fallbacks for SSE2 and AVX#65

Add int8/int16 reduction fallbacks for SSE2 and AVX#65
zsoerenm wants to merge 1 commit intoaff3ct:masterfrom
zsoerenm:fix/int8-int16-reduction-fallback

zsoerenm commented Mar 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

zsoerenm commented Mar 23, 2026

Summary

Changes

Test results

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant