Skip to content

fix(pass): Repair matrix layout for row-major ops#1231

Open
lwDavid wants to merge 1 commit intohw-native-sys:mainfrom
lwDavid:issue-1229-reproduce-texp-gm-blayout
Open

fix(pass): Repair matrix layout for row-major ops#1231
lwDavid wants to merge 1 commit intohw-native-sys:mainfrom
lwDavid:issue-1229-reproduce-texp-gm-blayout

Conversation

@lwDavid
Copy link
Copy Markdown
Contributor

@lwDavid lwDavid commented Apr 30, 2026

Summary

Fixes #1229.

ResolveBackendOpLayouts now repairs general non-row-major matrix tiles before backend ops that require row_major layout. Previously the pass only handled [N, 1] column-vector reshape repair, so full matrix tiles produced by paths such as matmul/tpop -> Vec -> neg -> exp could still reach pto.texp with blayout=col_major, causing ptoas to fail.

Root Cause

The backend layout spec for ops like tile.exp requires row-major inputs and outputs. The old repair pass only rewrote [N, 1] col-major vectors into [1, N] row-major views via tile.reshape. It skipped general matrix tiles such as [16, 256] with blayout=col_major, slayout=row_major, so the generated PTO kept a non-row-major source for pto.texp.

Changes

  • Extend ResolveBackendOpLayouts to detect any constrained tile input/output that is not row-major.
  • Preserve the existing [N, 1] vector reshape fast path.
  • Insert same-memory tile.move(..., blayout=row_major, slayout=none_box) for general non-row-major matrix inputs.
  • Restore original result layout after constrained ops using tile.reshape for column vectors or tile.move for general matrix tiles.
  • Add regression coverage for tile.exp on a col-major matrix tile.
  • Update English and Chinese pass documentation.

Fixes hw-native-sys#1229

ResolveBackendOpLayouts now inserts same-memory tile.move layout repairs for non-row-major matrix tiles before backend ops that require row_major inputs, while preserving the existing column-vector reshape path. Restores the original result layout after constrained ops and adds regression coverage for tile.exp on a col_major matrix tile.
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 30, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 5ded5451-6cca-4ef6-a605-5094644116d3

📥 Commits

Reviewing files that changed from the base of the PR and between 8a72fc3 and 0ca7027.

📒 Files selected for processing (4)
  • docs/en/dev/passes/16-resolve_backend_op_layouts.md
  • docs/zh-cn/dev/passes/16-resolve_backend_op_layouts.md
  • src/ir/transforms/resolve_backend_op_layouts_pass.cpp
  • tests/ut/ir/transforms/test_resolve_backend_op_layouts_pass.py

📝 Walkthrough

Walkthrough

The ResolveBackendOpLayouts pass is extended from handling only reshape-based repair of [N,1] column-major vectors to a broader constrained-layout repair strategy. For tiles requiring row-major layout, the pass now uses tile.reshape for column vectors and tile.move for general matrices. Output restoration is correspondingly updated to conditionally use either reshape or move based on tile structure. Documentation and unit tests are updated to reflect this enhancement.

Changes

Cohort / File(s) Summary
Documentation
docs/en/dev/passes/16-resolve_backend_op_layouts.md, docs/zh-cn/dev/passes/16-resolve_backend_op_layouts.md
Updated pass semantics from [N,1]-specific reshape repair to broader constrained-layout repair, describing new NeedsInputRepair/NeedsOutputRepair helpers, tile.move insertion for non-row-major tiles, and conditional output restoration logic.
Implementation
src/ir/transforms/resolve_backend_op_layouts_pass.cpp
Replaces IsRepairableCall gating with NeedsInputRepair/NeedsOutputRepair helpers. Input repair logic distinguishes column vectors (repaired via tile.reshape) from general matrices (repaired via tile.move with blayout/slayout kwargs). Output restoration made conditional: tile.reshape for column vectors, tile.move for other tiles. Modified GetTileLayout to tolerate missing tile_view_.
Tests
tests/ut/ir/transforms/test_resolve_backend_op_layouts_pass.py
Added new unit test for tile.exp on non-vector col_major tile, verifying tile.move-based layout coercion and restoration via move back to original layout.

Sequence Diagram

sequenceDiagram
    participant Pass as ResolveBackendOpLayouts
    participant Check as Layout Checker
    participant Repair as Repair Logic
    participant IR as IR Builder
    
    Pass->>Check: Check NeedsInputRepair for tile
    Check-->>Pass: True if input needs row_major
    Pass->>Repair: Determine repair method
    alt Column Vector + col_major
        Repair->>IR: Insert tile.reshape(arg, [1,N])
    else General Matrix + non-row_major
        Repair->>IR: Insert tile.move with blayout=row_major
    end
    IR-->>Pass: Repaired operand
    Pass->>Pass: Execute backend operation
    Pass->>Check: Check output restoration needs
    Check-->>Pass: Determine restoration method
    alt Column Vector Result
        Repair->>IR: Insert tile.reshape back
    else General Matrix Result
        Repair->>IR: Insert tile.move back to original blayout
    end
    IR-->>Pass: Restored tile
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested reviewers

  • lyfne123
  • Hzfengsy

Poem

🐰 Tiles shuffle left and right so true,
From col-major to row-major's hue,
A reshape here, a move-call there,
Layout repairs with utmost care!
The GM path now flows with grace—
No more blayout errors to face. 🎉

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 10.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title 'fix(pass): Repair matrix layout for row-major ops' accurately describes the primary change: extending ResolveBackendOpLayouts to repair non-row-major matrix tiles before backend ops requiring row_major layout.
Description check ✅ Passed The description is directly related to the changeset, clearly explaining the root cause of issue #1229 (col-major matrix tiles not being repaired), the solution (extending the repair pass), and the specific implementation changes (tile.move for matrices, tile.reshape for vectors).
Linked Issues check ✅ Passed The PR fully addresses the linked issue #1229: it extends ResolveBackendOpLayouts to repair non-row-major matrix tiles before ops like pto.texp, directly fixing the root cause where GM-routed data reached backend ops without row_major blayout.
Out of Scope Changes check ✅ Passed All changes are scoped to fixing issue #1229: documentation updates explain the new repair logic, the pass implementation handles matrix layout coercion, and the new test validates the fix for non-vector col-major tiles.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
Review rate limit: 0/1 reviews remaining, refill in 60 minutes.

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request enhances the ResolveBackendOpLayouts pass to support general matrix layout coercion. Previously, the pass only handled [N, 1] column-major vectors by reshaping them into row-major views; it now utilizes tile.move to coerce arbitrary non-row-major tiles into the required row_major layout and ensures the original layout is restored for assignment results. The review feedback identifies opportunities to optimize the C++ implementation by avoiding unnecessary heap allocations caused by temporary TileView object constructions during layout retrieval and restoration.

return TileLayout::row_major;
}
return tile_type->tile_view_->blayout;
return tile_type->tile_view_.value_or(TileView{}).blayout;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using value_or(TileView{}) creates a temporary TileView object, which includes a std::vector for valid_shape. This can lead to unnecessary heap allocations. Since a tile without an explicit view is implicitly row-major in this project, it is more efficient to use a ternary operator or an explicit check to avoid constructing the temporary.

Suggested change
return tile_type->tile_view_.value_or(TileView{}).blayout;
return tile_type->tile_view_ ? tile_type->tile_view_->blayout : TileLayout::row_major;

if (IsColumnVector(result_tile_type)) {
restore_call = CreateReshapeCall(row_major_var, result_tile_type->shape_, call->span_);
} else {
auto target_view = result_tile_type->tile_view_.value_or(TileView{});
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Similar to the efficiency concern in GetTileLayout, using value_or(TileView{}) here creates a temporary object. Since this code path is only reached if result_tile_type is not row-major (which implies it must have a tile_view_ with a non-row-major layout), you can safely use .value() to access the existing view and avoid the overhead of constructing a default TileView.

Suggested change
auto target_view = result_tile_type->tile_view_.value_or(TileView{});
const auto& target_view = result_tile_type->tile_view_.value();

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

Status: In Progress

Development

Successfully merging this pull request may close these issues.

[Bug] pto.texp op fails with 'expects src to use the row_major blayout' when data flows through GM

1 participant