Skip to content

Added planar types to speed up complex half precision GEMMs#1142

Open
cliffburdick wants to merge 9 commits intomainfrom
planar_tensor
Open

Added planar types to speed up complex half precision GEMMs#1142
cliffburdick wants to merge 9 commits intomainfrom
planar_tensor

Conversation

@cliffburdick
Copy link
Copy Markdown
Collaborator

No description provided.

@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot bot commented Mar 19, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Mar 19, 2026

Greptile Summary

This PR introduces matxFp16ComplexPlanar / matxBf16ComplexPlanar tag types and the companion planar() / interleaved() operators to allow users to pre-stage tensors in the split real/imaginary layout that cuBLASLt expects for complex half-precision GEMMs, avoiding a per-call conversion copy. The three previously-flagged issues (SetOp EPT regression, TotalSize offset for non-contiguous views, and c_adj leading-dimension mismatch) have all been addressed in the latest commits.

  • P1 — JIT imaginary-plane offset wrong in interleaved.h:93: out_dims_[rank_idx] / 2 evaluates to M/2 instead of M because out_dims_ already stores the halved output size. When MATX_EN_JIT is enabled and interleaved() appears in a JIT graph, the imaginary component of every element will be fetched from the wrong location, silently producing incorrect complex results.

Confidence Score: 4/5

Safe to merge except when MATX_EN_JIT is enabled — the interleaved JIT path will produce incorrect imaginary values until the plane-offset bug is fixed.

All three previously-raised P1 concerns are resolved. One new P1 correctness bug was found in the JIT path of ComplexInterleavedOp that must be fixed before this PR can be considered fully correct for JIT users.

include/matx/operators/interleaved.h (JIT offset bug at line 93)

Important Files Changed

Filename Overview
include/matx/operators/interleaved.h New ComplexInterleavedOp (planar to interleaved conversion); JIT operator() uses wrong plane offset (out_dims_[rank_idx]/2 instead of out_dims_[rank_idx]) producing incorrect imaginary values in JIT mode.
include/matx/operators/planar.h New ComplexPlanarOp (interleaved to planar conversion); non-JIT logic correct; JIT Size string has a redundant dead-code ternary (both branches identical) but no functional error.
include/matx/operators/set.h EPT regression for non-planar SetOp fixed; planar output now forces EPT=ONE via is_planar_complex_v guard; PlanarComplexProxy assignment handled correctly.
include/matx/core/tensor_impl.h LoadPlanarComplex/StorePlanarComplex added; TotalSize()-based offset is correct because constructor now asserts contiguous layout for planar tensors; PlanarComplexProxy provides safe read/write semantics.
include/matx/transforms/matmul/matmul_cuda.h Planar-input path added for complex half GEMMs; c_adj.Reset(c.Data()) correctly updates pointer for pre-planar C; ldc set to c.Size(RANK-1) to match cuBLASLt planar expectations; batch stride correctly doubled for planar layout.
include/matx/core/tensor.h ValidatePlanarLayoutOnCreate_ added to enforce contiguous and unit-innermost-stride constraints on planar complex tensors at construction time.
include/matx/core/half_complex.h New matxFp16ComplexPlanar and matxBf16ComplexPlanar tag types added; provide operator= from non-planar base type to support assignment in set().
include/matx/core/type_utils_both.h is_planar_complex_v trait added; is_complex_half_v updated to also cover planar variants.
test/00_operators/planar_test.cu New tests for planar transform and raw storage layout of matxFp16ComplexPlanar/matxBf16ComplexPlanar.
test/00_transform/MatMul.cu Two new typed tests for planar-annotated complex half GEMMs validating correctness against interleaved reference.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A["matmul(A, B) to C (complex half)"] --> B{Any tensor already planar?}
    B -- "A not planar" --> C["planar(A) to a_hp; a_adj.Reset(a_hp)"]
    B -- "A already planar" --> D["a_adj unchanged"]
    C --> E
    D --> E
    B -- "B not planar" --> F["planar(B) to b_hp; b_adj.Reset(b_hp)"]
    B -- "B already planar" --> G["b_adj unchanged"]
    F --> H
    G --> H
    B -- "C not planar" --> I["Allocate c_hp; c_adj.Reset(c_hp)"]
    B -- "C already planar" --> J["c_adj.Reset(c.Data())"]
    I --> K
    J --> K
    E & H & K --> L["cuBLASLt GEMM with PLANE_OFFSET"]
    L --> M{c_is_planar?}
    M -- No --> N["interleaved(c_adj) to c"]
    M -- Yes --> O["No conversion needed"]
Loading

Comments Outside Diff (1)

  1. include/matx/operators/interleaved.h, line 93 (link)

    Wrong imaginary-plane offset in JIT path

    out_dims_[rank_idx] stores Size(rank_idx), which for ComplexInterleavedOp is already halved: op_.Size(rank_idx) / 2 = M. Dividing by 2 again gives M/2, so the imaginary element is fetched from the middle of the real plane rather than from base[i + M]. The non-JIT path correctly uses op_.Size(rank_idx) / 2 = M as the offset. The JIT path must use the full output-dimension value — not half of it.

Reviews (8): Last reviewed commit: "Fixed segfault" | Re-trigger Greptile

Comment thread include/matx/operators/set.h
Comment thread include/matx/core/tensor_impl.h
Comment thread include/matx/transforms/matmul/matmul_cuda.h
@cliffburdick
Copy link
Copy Markdown
Collaborator Author

/build

1 similar comment
@cliffburdick
Copy link
Copy Markdown
Collaborator Author

/build

@cliffburdick
Copy link
Copy Markdown
Collaborator Author

/build

@cliffburdick
Copy link
Copy Markdown
Collaborator Author

/build

@cliffburdick
Copy link
Copy Markdown
Collaborator Author

/build

1 similar comment
@cliffburdick
Copy link
Copy Markdown
Collaborator Author

/build

@cliffburdick
Copy link
Copy Markdown
Collaborator Author

/build

1 similar comment
@cliffburdick
Copy link
Copy Markdown
Collaborator Author

/build

@cliffburdick
Copy link
Copy Markdown
Collaborator Author

/build

3 similar comments
@cliffburdick
Copy link
Copy Markdown
Collaborator Author

/build

@cliffburdick
Copy link
Copy Markdown
Collaborator Author

/build

@cliffburdick
Copy link
Copy Markdown
Collaborator Author

/build

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant