Add int8/int16 reduction fallbacks for SSE2 and AVX#65
Open
zsoerenm wants to merge 1 commit intoaff3ct:masterfrom
Open
Add int8/int16 reduction fallbacks for SSE2 and AVX#65zsoerenm wants to merge 1 commit intoaff3ct:masterfrom
zsoerenm wants to merge 1 commit intoaff3ct:masterfrom
Conversation
The _reduction and _Reduction specializations for 8-bit and 16-bit types were guarded by #ifdef __SSSE3__ (SSE backend) and #ifdef __AVX2__ (AVX backend), with no #else branch. This caused a runtime crash when hadd() was called on these types from code compiled for baseline SSE2 or vanilla AVX, as the default template calls exit(-1). SSE2 fallback: - Replace _mm_shuffle_epi8 (SSSE3) with _mm_shufflelo/hi_epi16 (SSE2) for 16-bit pair swaps - Use _mm_srli/slli_epi16 + OR (SSE2) for adjacent byte swaps AVX fallback (without AVX2): - Replace _mm256_permute4x64_epi64, _mm256_shuffle_epi32, and _mm256_shuffle_epi8 (all AVX2) with _mm256_shuffle_ps (AVX) for 64/32-bit shuffles, and extract-to-128-bit + SSSE3 _mm_shuffle_epi8 for byte-level shuffles Update hadd tests to run int8_t/int16_t on all ISA versions. Fixes aff3ct/aff3ct#216 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
The
_reductionand_Reductionspecializations forint8_t,int16_t,uint8_t, anduint16_twere guarded by#ifdef __SSSE3__(SSE backend) and#ifdef __AVX2__(AVX backend), with no#elsefallback. This caused a runtime crash (exit(-1)) whenhadd()was called on these types from code compiled for baseline SSE2 or vanilla AVX.This was discovered via aff3ct/aff3ct#216, where
bgemmt<int8_t>()callsmipp::hadd<int8_t>()during LDPC matrix operations.Changes
SSE2 fallback (
mipp_impl_SSE.hxx):_mm_shuffle_epi8(SSSE3) with_mm_shufflelo_epi16/_mm_shufflehi_epi16(SSE2) for 16-bit pair swaps_mm_srli_epi16/_mm_slli_epi16+ OR (SSE2) for adjacent byte swapsAVX fallback without AVX2 (
mipp_impl_AVX.hxx):_mm256_permute4x64_epi64,_mm256_shuffle_epi32,_mm256_shuffle_epi8(all AVX2) with_mm256_shuffle_ps(AVX) for 64/32-bit shuffles_mm_shuffle_epi8for byte-level shufflesTests (
tests/src/reductions/hadd.cpp):MIPP_INSTR_VERSION >= 31guard soint8_t/int16_thadd tests run on all SSE versionsTest results
Test plan
haddtests for SSE2-only (-msse2 -mno-sse3 -mno-ssse3 ...)haddtests for AVX-only (-mavx -mno-avx2)haddtests for native AVX2 (-march=native)🤖 Generated with Claude Code