Skip to content

test(st): Add Mat slice to Left matmul pattern system test#1225

Open
Crystal-wzy wants to merge 1 commit intohw-native-sys:mainfrom
Crystal-wzy:bugfix
Open

test(st): Add Mat slice to Left matmul pattern system test#1225
Crystal-wzy wants to merge 1 commit intohw-native-sys:mainfrom
Crystal-wzy:bugfix

Conversation

@Crystal-wzy
Copy link
Copy Markdown
Contributor

@Crystal-wzy Crystal-wzy commented Apr 29, 2026

Summary

  • Add TestMatSliceToLeft PTOTestCase validating the wide Mat load → slice → Left matmul pattern for issue [Feature] Support target_memory on pl.slice (or fuse slice+move) for Mat→Left/Right path #1198, where a single [M, 2K] BF16 tile is loaded into Mat to satisfy the 512 B GM row-alignment constraint, sliced into two [M, K] subviews, moved to Left, and used in K-split matmul + matmul_acc accumulation
  • Add TestMatSliceToLeftSuite pytest suite parametrized over PLATFORMS with PyTorch golden reference (A[:,:K]@b0 + A[:,K:]@b1)
  • Codegen expectation: pl.slice on Mat tile → pto.subview (zero-copy), pl.move from subview → pto.tmov (Mat subview → Left)

Testing

  • Parametrized test passes on 910B PTO hardware
  • PyTorch golden reference validated
  • Pre-commit hooks pass (ruff, pyright)

Closes #1198

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 29, 2026

Warning

Rate limit exceeded

@Crystal-wzy has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 55 minutes and 13 seconds before requesting another review.

To keep reviews running without waiting, you can enable usage-based add-on for your organization. This allows additional reviews beyond the hourly cap. Account admins can enable it under billing.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 930c3bee-d6c9-4887-9050-d89bd866fa0d

📥 Commits

Reviewing files that changed from the base of the PR and between fcc9c07 and 3aa67fd.

📒 Files selected for processing (1)
  • tests/st/runtime/test_mat_slice_to_left.py
📝 Walkthrough

Walkthrough

Adds a new runtime test that validates Mat → slice → move-to-Left → matmul(+matmul_acc) codegen, and updates the call-direction derivation pass to avoid promoting certain Out args to InOut when writes are disjoint-store patterns (with a per-callee/arg cache). Also adds a unit test for the R-seq exception case.

Changes

Cohort / File(s) Summary
Runtime test
tests/st/runtime/test_mat_slice_to_left.py
New PyTest runtime test that compiles/runs a program loading a wide BF16 Mat tile, slices it into two Left subviews, loads two Right tiles, performs matmul + matmul_acc, stores FP32 result, and asserts correctness vs. PyTorch reference.
Call-direction analysis
src/ir/transforms/derive_call_directions_pass.cpp
Added transform_utils import, a disjoint-store analysis and memoizing disjoint_store_cache_, and updated CallDirectionMutator logic to skip R-seq promotion to InOut when a callee/arg exhibits disjoint tile.store behavior.
Unit test
tests/ut/ir/transforms/test_derive_call_directions.py
New unit test covering the R-seq “exception” case: verifies an Out param remains OutputExisting inside a sequential loop when writes use tile.store offsets dependent on another param, and checks intermediate scalar direction and single kernel call.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested reviewers

  • lyfne123
  • Hzfengsy

Poem

🐰 I nibble code and hop with glee,
Wide Mat splits tuck snug as can be.
Left and Right now dance in pair,
Accumulating sums with flair,
Rabbit cheers — the tests agree! 🥕✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 12.50% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically describes the main change: adding a test for the Mat slice to Left matmul pattern from issue #1198.
Linked Issues check ✅ Passed The PR adds a system test validating the Mat slice to Left matmul pattern required by issue #1198, implementing the test case that demonstrates the desired slice-to-Left behavior.
Out of Scope Changes check ✅ Passed All changes are focused on adding a new runtime test module for the Mat slice to Left pattern; no out-of-scope modifications to unrelated code.
Description check ✅ Passed The pull request description comprehensively relates to the changeset, detailing the new test case for the Mat slice to Left matmul pattern, including the tensor shapes, program structure, expected codegen behavior, and testing validation.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
Review rate limit: 0/1 reviews remaining, refill in 55 minutes and 13 seconds.

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
tests/st/runtime/test_mat_slice_to_left.py (1)

131-138: Assert codegen pattern explicitly, not only result.passed.

This test currently proves correctness/compilability, but it doesn’t explicitly lock the regression target (pl.slice(Mat) -> pto.subview, then pl.move(subview, Left) -> pto.tmov). Adding codegen-text assertions here would make the test much more robust against future lowering drift.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/st/runtime/test_mat_slice_to_left.py` around lines 131 - 138, The test
should not only assert result.passed but also verify the generated IR/text
contains the expected lowering patterns: check that the lowering of
TestMatSliceToLeft includes a pl.slice(Mat) -> pto.subview sequence and a
pl.move(subview, Left) -> pto.tmov sequence; after calling
test_runner.run(TestMatSliceToLeft) use the runner's returned artifact that
exposes generated code (e.g., result.codegen or result.ir) and add assertions
that those exact substrings/patterns appear (or use regexes) to lock the
regression target in the TestMatSliceToLeft test.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/st/runtime/test_mat_slice_to_left.py`:
- Line 18: Docstring text contains the Unicode multiplication sign "×" which
triggers Ruff RUF002; update the docstring line containing the phrase "K-chunks
into one wider GM load (row = 2 × 128 × 2 = 512 B) and splitting" in
tests/st/runtime/test_mat_slice_to_left.py to use the ASCII letter "x" (e.g.,
"row = 2 x 128 x 2 = 512 B") so the comment no longer contains ambiguous Unicode
characters.

---

Nitpick comments:
In `@tests/st/runtime/test_mat_slice_to_left.py`:
- Around line 131-138: The test should not only assert result.passed but also
verify the generated IR/text contains the expected lowering patterns: check that
the lowering of TestMatSliceToLeft includes a pl.slice(Mat) -> pto.subview
sequence and a pl.move(subview, Left) -> pto.tmov sequence; after calling
test_runner.run(TestMatSliceToLeft) use the runner's returned artifact that
exposes generated code (e.g., result.codegen or result.ir) and add assertions
that those exact substrings/patterns appear (or use regexes) to lock the
regression target in the TestMatSliceToLeft test.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 5b292123-43d0-45ab-99fe-5b4291d8f8b9

📥 Commits

Reviewing files that changed from the base of the PR and between 715927a and fcc9c07.

📒 Files selected for processing (1)
  • tests/st/runtime/test_mat_slice_to_left.py

Comment thread tests/st/runtime/test_mat_slice_to_left.py Outdated
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new system test, test_mat_slice_to_left.py, which validates a specific memory access pattern: loading a wide BF16 tile into Mat memory to meet alignment requirements, slicing it into subviews, and moving those subviews to Left memory for matmul operations. This test specifically targets scenarios encountered in Qwen3 decode kernels. I have no feedback to provide.

@Crystal-wzy Crystal-wzy force-pushed the bugfix branch 4 times, most recently from ee59324 to 70dd9e9 Compare April 29, 2026 08:42
## Summary
- Add TestMatSliceToLeft PTOTestCase validating the wide Mat load →
  slice → Left matmul pattern for issue hw-native-sys#1198, where a single [M, 2K]
  BF16 tile is loaded into Mat to satisfy the 512 B GM row-alignment
  constraint, sliced into two [M, K] subviews, moved to Left, and used
  in K-split matmul + matmul_acc accumulation
- Add TestMatSliceToLeftSuite pytest suite parametrized over PLATFORMS
  with PyTorch golden reference (A[:,:K]@b0 + A[:,K:]@b1)
- Codegen expectation: pl.slice on Mat tile → pto.subview (zero-copy),
  pl.move from subview → pto.tmov (Mat subview → Left)

## Testing
- [x] Parametrized test passes on 910B PTO hardware
- [x] PyTorch golden reference validated
- [x] Pre-commit hooks pass (ruff, pyright)

Closes hw-native-sys#1198
@Crystal-wzy Crystal-wzy force-pushed the bugfix branch 2 times, most recently from 17bdc62 to 3aa67fd Compare April 30, 2026 03:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

[Feature] Support target_memory on pl.slice (or fuse slice+move) for Mat→Left/Right path

1 participant