Added planar types to speed up complex half precision GEMMs by cliffburdick · Pull Request #1142 · NVIDIA/MatX

cliffburdick · 2026-03-19T20:08:30Z

No description provided.

copy-pr-bot · 2026-03-19T20:08:34Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

greptile-apps · 2026-03-19T20:15:43Z

Greptile Summary

This PR introduces matxFp16ComplexPlanar / matxBf16ComplexPlanar tag types and the companion planar() / interleaved() operators to allow users to pre-stage tensors in the split real/imaginary layout that cuBLASLt expects for complex half-precision GEMMs, avoiding a per-call conversion copy. The three previously-flagged issues (SetOp EPT regression, TotalSize offset for non-contiguous views, and c_adj leading-dimension mismatch) have all been addressed in the latest commits.

P1 — JIT imaginary-plane offset wrong in interleaved.h:93: out_dims_[rank_idx] / 2 evaluates to M/2 instead of M because out_dims_ already stores the halved output size. When MATX_EN_JIT is enabled and interleaved() appears in a JIT graph, the imaginary component of every element will be fetched from the wrong location, silently producing incorrect complex results.

Confidence Score: 4/5

Safe to merge except when MATX_EN_JIT is enabled — the interleaved JIT path will produce incorrect imaginary values until the plane-offset bug is fixed.

All three previously-raised P1 concerns are resolved. One new P1 correctness bug was found in the JIT path of ComplexInterleavedOp that must be fixed before this PR can be considered fully correct for JIT users.

include/matx/operators/interleaved.h (JIT offset bug at line 93)

Important Files Changed

Filename	Overview
include/matx/operators/interleaved.h	New ComplexInterleavedOp (planar to interleaved conversion); JIT operator() uses wrong plane offset (out_dims_[rank_idx]/2 instead of out_dims_[rank_idx]) producing incorrect imaginary values in JIT mode.
include/matx/operators/planar.h	New ComplexPlanarOp (interleaved to planar conversion); non-JIT logic correct; JIT Size string has a redundant dead-code ternary (both branches identical) but no functional error.
include/matx/operators/set.h	EPT regression for non-planar SetOp fixed; planar output now forces EPT=ONE via is_planar_complex_v guard; PlanarComplexProxy assignment handled correctly.
include/matx/core/tensor_impl.h	LoadPlanarComplex/StorePlanarComplex added; TotalSize()-based offset is correct because constructor now asserts contiguous layout for planar tensors; PlanarComplexProxy provides safe read/write semantics.
include/matx/transforms/matmul/matmul_cuda.h	Planar-input path added for complex half GEMMs; c_adj.Reset(c.Data()) correctly updates pointer for pre-planar C; ldc set to c.Size(RANK-1) to match cuBLASLt planar expectations; batch stride correctly doubled for planar layout.
include/matx/core/tensor.h	ValidatePlanarLayoutOnCreate_ added to enforce contiguous and unit-innermost-stride constraints on planar complex tensors at construction time.
include/matx/core/half_complex.h	New matxFp16ComplexPlanar and matxBf16ComplexPlanar tag types added; provide operator= from non-planar base type to support assignment in set().
include/matx/core/type_utils_both.h	is_planar_complex_v trait added; is_complex_half_v updated to also cover planar variants.
test/00_operators/planar_test.cu	New tests for planar transform and raw storage layout of matxFp16ComplexPlanar/matxBf16ComplexPlanar.
test/00_transform/MatMul.cu	Two new typed tests for planar-annotated complex half GEMMs validating correctness against interleaved reference.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A["matmul(A, B) to C (complex half)"] --> B{Any tensor already planar?}
    B -- "A not planar" --> C["planar(A) to a_hp; a_adj.Reset(a_hp)"]
    B -- "A already planar" --> D["a_adj unchanged"]
    C --> E
    D --> E
    B -- "B not planar" --> F["planar(B) to b_hp; b_adj.Reset(b_hp)"]
    B -- "B already planar" --> G["b_adj unchanged"]
    F --> H
    G --> H
    B -- "C not planar" --> I["Allocate c_hp; c_adj.Reset(c_hp)"]
    B -- "C already planar" --> J["c_adj.Reset(c.Data())"]
    I --> K
    J --> K
    E & H & K --> L["cuBLASLt GEMM with PLANE_OFFSET"]
    L --> M{c_is_planar?}
    M -- No --> N["interleaved(c_adj) to c"]
    M -- Yes --> O["No conversion needed"]

Comments Outside Diff (1)

include/matx/operators/interleaved.h, line 93 (link)

Wrong imaginary-plane offset in JIT path

out_dims_[rank_idx] stores Size(rank_idx), which for ComplexInterleavedOp is already halved: op_.Size(rank_idx) / 2 = M. Dividing by 2 again gives M/2, so the imaginary element is fetched from the middle of the real plane rather than from base[i + M]. The non-JIT path correctly uses op_.Size(rank_idx) / 2 = M as the offset. The JIT path must use the full output-dimension value — not half of it.

_{Reviews (8): Last reviewed commit: "Fixed segfault" | Re-trigger Greptile}

cliffburdick · 2026-03-19T21:04:14Z

/build

cliffburdick · 2026-03-20T15:41:57Z

/build

cliffburdick · 2026-03-20T21:05:22Z

/build

cliffburdick · 2026-04-03T16:16:17Z

/build

cliffburdick · 2026-04-06T22:43:02Z

/build

cliffburdick · 2026-04-08T16:35:55Z

/build

…y is freed

cliffburdick · 2026-04-10T18:56:51Z

/build

cliffburdick · 2026-04-14T21:35:23Z

/build

cliffburdick · 2026-04-15T22:04:49Z

/build

cliffburdick · 2026-04-15T23:04:30Z

/build

cliffburdick · 2026-04-16T16:10:27Z

/build

cliffburdick · 2026-04-16T18:00:36Z

/build

cliffburdick added 2 commits March 19, 2026 13:04

Added planar types to speed up complex half precision GEMMs

33ec90f

Cleanup

2507608

greptile-apps bot reviewed Mar 19, 2026

View reviewed changes

Comment thread include/matx/operators/set.h

Comment thread include/matx/core/tensor_impl.h

Comment thread include/matx/transforms/matmul/matmul_cuda.h

cliffburdick added 2 commits March 19, 2026 13:29

Code review updates

c47a6cc

Code review updates

59d5320

Compilation error

de287c9

Fix failing sparse and reshape unit tests

4da48da

More changes for affine indexing

10902a4

Fixed issue with teardown where context may die in tests before memor…

1ad93b0

…y is freed

Fixed segfault

3386a58

Conversation

cliffburdick commented Mar 19, 2026

Uh oh!

copy-pr-bot bot commented Mar 19, 2026

Uh oh!

greptile-apps bot commented Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Flowchart

Comments Outside Diff (1)

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cliffburdick commented Mar 19, 2026

Uh oh!

cliffburdick commented Mar 20, 2026

Uh oh!

cliffburdick commented Mar 20, 2026

Uh oh!

cliffburdick commented Apr 3, 2026

Uh oh!

cliffburdick commented Apr 6, 2026

Uh oh!

cliffburdick commented Apr 8, 2026

Uh oh!

cliffburdick commented Apr 10, 2026

Uh oh!

cliffburdick commented Apr 14, 2026

Uh oh!

cliffburdick commented Apr 15, 2026

Uh oh!

cliffburdick commented Apr 15, 2026

Uh oh!

cliffburdick commented Apr 16, 2026

Uh oh!

cliffburdick commented Apr 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

greptile-apps bot commented Mar 19, 2026 •

edited

Loading