Support group_size=64 in HybridW4A16 and wvSplitK_int4_g by mgehre-amd · Pull Request #905 · ROCm/vllm

mgehre-amd · 2026-04-27T12:33:34Z

The HIP wvSplitK_int4_g C++ kernel only supported group_size 32 and 128, but HybridW4A16LinearKernel accepted 32, 64, 128, and 256. When a model using group_size=64 (e.g. RedHatAI/Qwen3-1.7B-quantized.w4a16) hit the decode path, the C++ kernel rejected it at runtime.

The kernel template already handles arbitrary group sizes that are multiples of A_CHUNK (16), so the fix extends the TORCH_CHECK and the WVSPLIT_INT4G_GS dispatch macro to include 64. SUPPORTED_GROUP_SIZES is narrowed to [32, 64, 128] so there is no mismatch between what can_implement accepts and what the C++ kernel supports.

Build time impact: skinny_gemms_int4.hip.o compile time increases from 158s to 233s (+47%) due to the additional template instantiations for group_size=64.

The HIP wvSplitK_int4_g C++ kernel only supported group_size 32 and 128, but HybridW4A16LinearKernel accepted 32, 64, 128, and 256. When a model using group_size=64 (e.g. RedHatAI/Qwen3-1.7B-quantized.w4a16) hit the decode path, the C++ kernel rejected it at runtime. The kernel template already handles arbitrary group sizes that are multiples of A_CHUNK (16), so the fix extends the TORCH_CHECK and the WVSPLIT_INT4G_GS dispatch macro to include 64. SUPPORTED_GROUP_SIZES is narrowed to [32, 64, 128] so there is no mismatch between what can_implement accepts and what the C++ kernel supports. Build time impact: skinny_gemms_int4.hip.o compile time increases from 158s to 233s (+47%) due to the additional template instantiations for group_size=64. Signed-off-by: Matthias Gehre <matthias.gehre@amd.com>

roberteg16

LGTM

The main wvSplitK_int4_g function was updated for group_size=64 but the two VLLM_SKINNY_GEMM_SWEEP sweep variants (wvSplitK_int4g_sweep and wvSplitK_int4g_hf_sweep) still had hard-coded 32/128 checks and dispatch. Also updates the docstring on wvSplitK_int4_g. Signed-off-by: Matthias Gehre <matthias.gehre@amd.com>

Resolve conflict in skinny_gemms_int4.cu: gfx11 moved dispatch macros to file scope (shared with MoE); apply group_size=64 and N=1 tuning changes to both the regular and MoE macro sets. Signed-off-by: Matthias Gehre <matthias.gehre@amd.com>

mgehre-amd requested review from eble-amd and roberteg16 April 27, 2026 12:33

mgehre-amd requested a review from gshtras as a code owner April 27, 2026 12:33

mgehre-amd removed the request for review from gshtras April 27, 2026 12:33

roberteg16 approved these changes Apr 27, 2026

View reviewed changes

eble-amd reviewed Apr 27, 2026

View reviewed changes

Comment thread csrc/rocm/skinny_gemms_int4.cu Outdated

Comment thread csrc/rocm/skinny_gemms_int4.cu

Comment thread csrc/rocm/skinny_gemms_int4.cu Outdated

Comment thread csrc/rocm/skinny_gemms_int4.cu

Comment thread csrc/rocm/skinny_gemms_int4.cu Outdated

mgehre-amd added 2 commits April 27, 2026 15:58

Merge gfx11 into matthias.fix-group-size-64

6804683

Resolve conflict in skinny_gemms_int4.cu: gfx11 moved dispatch macros to file scope (shared with MoE); apply group_size=64 and N=1 tuning changes to both the regular and MoE macro sets. Signed-off-by: Matthias Gehre <matthias.gehre@amd.com>

eble-amd approved these changes Apr 28, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support group_size=64 in HybridW4A16 and wvSplitK_int4_g#905

Support group_size=64 in HybridW4A16 and wvSplitK_int4_g#905
mgehre-amd wants to merge 3 commits intogfx11from
matthias.fix-group-size-64

mgehre-amd commented Apr 27, 2026 •

edited by github-actions Bot

Loading

Uh oh!

roberteg16 left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

mgehre-amd commented Apr 27, 2026 • edited by github-actions Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

roberteg16 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

mgehre-amd commented Apr 27, 2026 •

edited by github-actions Bot

Loading