Skip to content

Add int8/int16 reduction fallbacks for SSE2 and AVX#65

Open
zsoerenm wants to merge 1 commit intoaff3ct:masterfrom
zsoerenm:fix/int8-int16-reduction-fallback
Open

Add int8/int16 reduction fallbacks for SSE2 and AVX#65
zsoerenm wants to merge 1 commit intoaff3ct:masterfrom
zsoerenm:fix/int8-int16-reduction-fallback

Conversation

@zsoerenm
Copy link
Copy Markdown

Summary

The _reduction and _Reduction specializations for int8_t, int16_t, uint8_t, and uint16_t were guarded by #ifdef __SSSE3__ (SSE backend) and #ifdef __AVX2__ (AVX backend), with no #else fallback. This caused a runtime crash (exit(-1)) when hadd() was called on these types from code compiled for baseline SSE2 or vanilla AVX.

This was discovered via aff3ct/aff3ct#216, where bgemmt<int8_t>() calls mipp::hadd<int8_t>() during LDPC matrix operations.

Changes

SSE2 fallback (mipp_impl_SSE.hxx):

  • Replaces _mm_shuffle_epi8 (SSSE3) with _mm_shufflelo_epi16 / _mm_shufflehi_epi16 (SSE2) for 16-bit pair swaps
  • Uses _mm_srli_epi16 / _mm_slli_epi16 + OR (SSE2) for adjacent byte swaps

AVX fallback without AVX2 (mipp_impl_AVX.hxx):

  • Replaces _mm256_permute4x64_epi64, _mm256_shuffle_epi32, _mm256_shuffle_epi8 (all AVX2) with _mm256_shuffle_ps (AVX) for 64/32-bit shuffles
  • Extracts to 128-bit halves + SSSE3 _mm_shuffle_epi8 for byte-level shuffles

Tests (tests/src/reductions/hadd.cpp):

  • Removed MIPP_INSTR_VERSION >= 31 guard so int8_t/int16_t hadd tests run on all SSE versions

Test results

ISA Assertions Result
SSE2 1200 All passed
AVX (no AVX2) 400 All passed
AVX2 (native) 1200 All passed

Test plan

  • Built and ran hadd tests for SSE2-only (-msse2 -mno-sse3 -mno-ssse3 ...)
  • Built and ran hadd tests for AVX-only (-mavx -mno-avx2)
  • Built and ran hadd tests for native AVX2 (-march=native)
  • Verified int8_t/int16_t sections now execute on SSE2 (previously skipped)

🤖 Generated with Claude Code

The _reduction and _Reduction specializations for 8-bit and 16-bit types
were guarded by #ifdef __SSSE3__ (SSE backend) and #ifdef __AVX2__ (AVX
backend), with no #else branch. This caused a runtime crash when hadd()
was called on these types from code compiled for baseline SSE2 or
vanilla AVX, as the default template calls exit(-1).

SSE2 fallback:
- Replace _mm_shuffle_epi8 (SSSE3) with _mm_shufflelo/hi_epi16 (SSE2)
  for 16-bit pair swaps
- Use _mm_srli/slli_epi16 + OR (SSE2) for adjacent byte swaps

AVX fallback (without AVX2):
- Replace _mm256_permute4x64_epi64, _mm256_shuffle_epi32, and
  _mm256_shuffle_epi8 (all AVX2) with _mm256_shuffle_ps (AVX) for
  64/32-bit shuffles, and extract-to-128-bit + SSSE3 _mm_shuffle_epi8
  for byte-level shuffles

Update hadd tests to run int8_t/int16_t on all ISA versions.

Fixes aff3ct/aff3ct#216

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant