fix(codegen): Reuse matmul acc buffer by lwDavid · Pull Request #1216 · hw-native-sys/pypto

lwDavid · 2026-04-29T03:51:55Z

Lower in-place tile accumulator ops with the accumulator SSA as both ins and outs, and keep the assignment result bound to that same buffer so the final store reads the updated accumulator.

coderabbitai · 2026-04-29T03:52:07Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

Ensure in-place accumulator codegen uses the accumulator operand's tile-buffer SSA and type for tile.matmul_acc / tile.gemv_acc, bind the assignment result to that SSA, and prevent duplicate pto.alloc_tile emissions; add unit and runtime tests validating accumulator SSA preservation across loop-carried matmul_acc.

Changes

Cohort / File(s)	Summary
Backend in-place accumulation codegen `src/backend/common/pto_ops_common.cpp`	Extract accumulator destination SSA/type from the IR accumulator input (`op->args_[0]`), validate presence of a tile-buffer SSA, call `codegen.SetCurrentResultBuf(dst)` to bind result to that SSA, and emit `pto.tmatmul.acc`/`pto.tgemv.acc` with matching `ins(acc, ...)` and `outs(acc)`.
PTO codegen allocation & SSA reuse `src/codegen/pto/pto_codegen.cpp`, `include/pypto/codegen/pto/pto_codegen.h`	Suppress automatic tile allocation for in-place accumulator destinations during `AssignStmt` handling; add conditional reuse of tile-buffer SSA for statically-shaped operand MemRefs keyed by base MemRef + extents; add `emitted_tile_alloc_ssas` to track and deduplicate emitted `pto.alloc_tile` by SSA.
Unit & runtime tests `tests/ut/codegen/test_pto_codegen.py`, `tests/st/runtime/test_matmul.py`	New MLIR helper for single-function output and tests: unit test checks SSA consistency/reuse across `pto.tmatmul` → `pto.tmatmul.acc` → `pto.tstore`; runtime regression adds matmul_acc case with `b_trans=True` splitting K to validate numerical correctness on a2a3 (FP32/BF16).

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

fix(codegen): remove obsolete tmov copy in accumulation op codegen #696: Modifies the same in-place accumulation codegen path; contrasts with this PR's approach by using the assignment result as accumulator operand — strong overlap in touched logic.
fix(codegen,memory): fix matmulacc output mismatch on Ascend NPU #537: Changes SSA/allocation handling for tile.matmul_acc/tile.gemv_acc, addressing accumulator/destination sharing similarly to this PR.
fix(codegen): prevent SSA name dedup for tile vars in sibling if-else branches #721: Adds tile-buffer SSA reuse/dedup logic in codegen; overlaps with this PR's SSA reuse and emitted-alloc tracking changes.

Suggested reviewers

lyfne123
zhangqi-chen

Poem

🐰 I hop through SSA fields at dawn,
I bind the acc where it was drawn,
No duped tiles to crowd the way,
Loop-carried buffers save the day;
Tests nibble green — the numbers sing! 🎉

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 75.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'fix(codegen): Reuse matmul acc buffer' directly and concisely summarizes the main change: fixing the codegen to properly reuse the accumulator buffer in matmul operations.
Description check	✅ Passed	The description explains the fix (using accumulator SSA as both ins and outs) and its purpose (ensuring the final store reads the updated accumulator), and references issue `#1213` which matches the problem addressed in the PR.
Linked Issues check	✅ Passed	The PR directly addresses issue `#1213` by fixing accumulator buffer handling in tile.matmul_acc and tile.gemv_acc codegen. Changes to pto_ops_common.cpp and pto_codegen.cpp implement proper accumulator SSA reuse, and new tests verify the fix works correctly.
Out of Scope Changes check	✅ Passed	All changes are focused on fixing the matmul_acc accumulator buffer issue: internal codegen logic, accumulator SSA tracking, tile allocation deduplication, and related tests. No extraneous changes detected.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist

Code Review

This pull request modifies the PTO codegen to handle in-place accumulation operations (matmul_acc and gemv_acc) by reusing the accumulator's SSA value for both input and output. It updates make_acc_codegen to bind the result to the input accumulator's buffer and modifies the assignment visitor to skip redundant tile allocations for these operations. A new unit test verifies that loop-carried accumulators correctly share buffers in the generated MLIR. I have no feedback to provide.

coderabbitai

🧹 Nitpick comments (1)

tests/ut/codegen/test_pto_codegen.py (1)
1565-1605: Please add the tile.gemv_acc sibling case too.

This test locks down the tile.matmul_acc fix well, but the production change also updates tile.gemv_acc through the same custom codegen and alloc-suppression path. A small parametrized variant here would keep that sibling path from regressing silently.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/ut/codegen/test_pto_codegen.py` around lines 1565 - 1605, Extend the
existing test_pto_codegen_matmul_acc_uses_loop_carried_accumulator_buffer to
also cover the sibling gemv path: add a parametrized variant (or a second
similar test) that uses pl.gemv_acc instead of pl.matmul_acc (and initial accum
via pl.gemv or matching matvec op), then generate MLIR and assert the
accumulator buffer is loop-carried and used by the final store by searching for
"pto.tgemv.acc" (parallel to the existing "pto.tmatmul.acc") and verifying the
same ins/outs operand identity checks as done for matmul_acc; update the MLIR
line-match regexes to look for "pto.tgemv.acc" and the store accordingly so the
gemv_acc codegen path is locked down.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@tests/ut/codegen/test_pto_codegen.py`:
- Around line 1565-1605: Extend the existing
test_pto_codegen_matmul_acc_uses_loop_carried_accumulator_buffer to also cover
the sibling gemv path: add a parametrized variant (or a second similar test)
that uses pl.gemv_acc instead of pl.matmul_acc (and initial accum via pl.gemv or
matching matvec op), then generate MLIR and assert the accumulator buffer is
loop-carried and used by the final store by searching for "pto.tgemv.acc"
(parallel to the existing "pto.tmatmul.acc") and verifying the same ins/outs
operand identity checks as done for matmul_acc; update the MLIR line-match
regexes to look for "pto.tgemv.acc" and the store accordingly so the gemv_acc
codegen path is locked down.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 899cae86-5ec2-4643-829a-9fcf7fe00523

📥 Commits

Reviewing files that changed from the base of the PR and between e2c409d and 42efdf7.

📒 Files selected for processing (3)

src/backend/common/pto_ops_common.cpp
src/codegen/pto/pto_codegen.cpp
tests/ut/codegen/test_pto_codegen.py

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

src/codegen/pto/pto_codegen.cpp (1)
69-71: Add coverage for tile.gemv_acc as well.

This helper opts tile.gemv_acc into the same in-place accumulator path as tile.matmul_acc, but the new regressions only assert the matmul branch. A small MLIR test that checks pto.tgemv.acc keeps ins(acc) and outs(acc) on the same SSA would keep the sibling path from drifting.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/codegen/pto/pto_codegen.cpp` around lines 69 - 71, The helper
IsInPlaceAccumulatorCall currently only returns true for "tile.matmul_acc" but
the review requests opting "tile.gemv_acc" into the same in-place accumulator
path; update IsInPlaceAccumulatorCall to check for both "tile.matmul_acc" and
"tile.gemv_acc" (i.e., include a second equality check against "tile.gemv_acc")
so the gemv branch follows the same in-place accumulator logic as matmul; ensure
the function still guards for null call and op_ like the existing code.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/codegen/pto/pto_codegen.cpp`:
- Around line 244-267: The reuse key currently collapses distinct sub-views by
using memref->base_ plus the tile signature (via GetStaticMatmulOperandReuseKey)
but ignores per-view memref->byte_offset_, causing different views into the same
allocation to be aliased; update the key used by matmul_operand_reuse to include
memref->byte_offset_ (e.g., use std::make_pair(std::tuple(base_ptr,
memref->byte_offset_), *reuse_key) or otherwise append byte_offset to the key)
and also adjust the alloc_tile emission/suppression logic that skips emitting a
second pto.alloc_tile when a reuse hit occurs (the code around the NewNamedTemp/
matmul_operand_reuse lookup and the later alloc_tile suppression) so that
suppression only happens when base and byte_offset (and signature) truly match.
Ensure you reference/modify GetStaticMatmulOperandReuseKey usage,
matmul_operand_reuse, memref->base_, memref->byte_offset_, and the alloc_tile
emission path to apply the fix.

---

Nitpick comments:
In `@src/codegen/pto/pto_codegen.cpp`:
- Around line 69-71: The helper IsInPlaceAccumulatorCall currently only returns
true for "tile.matmul_acc" but the review requests opting "tile.gemv_acc" into
the same in-place accumulator path; update IsInPlaceAccumulatorCall to check for
both "tile.matmul_acc" and "tile.gemv_acc" (i.e., include a second equality
check against "tile.gemv_acc") so the gemv branch follows the same in-place
accumulator logic as matmul; ensure the function still guards for null call and
op_ like the existing code.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 506e27c2-395c-402f-9e73-2689f9bca268

📥 Commits

Reviewing files that changed from the base of the PR and between 42efdf7 and 5a1bf01.

📒 Files selected for processing (5)

include/pypto/codegen/pto/pto_codegen.h
src/backend/common/pto_ops_common.cpp
src/codegen/pto/pto_codegen.cpp
tests/st/runtime/test_matmul.py
tests/ut/codegen/test_pto_codegen.py

🚧 Files skipped from review as they are similar to previous changes (1)

src/backend/common/pto_ops_common.cpp

coderabbitai · 2026-04-29T09:09:07Z

+  std::map<std::pair<const ir::Var*, std::string>, std::string> matmul_operand_reuse;
  for (const auto& [tile_var, tile_type] : fs_.tile_var_allocs) {
-    std::string ssa_name = NewNamedTemp(tile_var->name_hint_);
-    BindVarToMlir(tile_var, ssa_name);
-
    // Pre-populate type so body visitors (e.g., tile.reshape no-op check)
    // can query it before per-variable alloc_tile emission runs. Tile types
    // are always emitted with `v_row=?, v_col=?`; the actual extents flow
    // through the alloc_tile valid_row/valid_col operands.
    std::string type_str = GetTileBufTypeStringFromTileType(tile_type);
-    fs_.ssa_to_tile_buf_type[ssa_name] = type_str;
-
    auto memref = ir::GetDefinedMemRef(tile_type);
+    const ir::Var* base_ptr = memref->base_.get();
+
+    std::string ssa_name;
+    auto reuse_key = GetStaticMatmulOperandReuseKey(tile_type, type_str);
+    if (reuse_key.has_value() && fs_.tpop_result_vars.count(tile_var.get()) == 0) {
+      auto key = std::make_pair(base_ptr, *reuse_key);
+      auto reuse_it = matmul_operand_reuse.find(key);
+      if (reuse_it != matmul_operand_reuse.end()) {
+        ssa_name = reuse_it->second;
+      } else {
+        ssa_name = NewNamedTemp(tile_var->name_hint_);
+        matmul_operand_reuse.emplace(std::move(key), ssa_name);
+      }
+    } else {
+      ssa_name = NewNamedTemp(tile_var->name_hint_);
+    }


⚠️ Potential issue | 🟠 Major

Reuse key collapses distinct sub-buffers that only share the same allocation base.

Line 252 keys matmul_operand_reuse by memref->base_ plus the tile signature, but not by memref->byte_offset_. In this IR model, multiple views can intentionally share the same base_ while still pointing at different regions. Once that happens, Lines 718-720 suppress the second pto.alloc_tile, so both operands end up bound to the first address and later matmul/gemv ops can read the wrong L0A/L0B buffer.

💡 Suggested fix

- std::map<std::pair<const ir::Var*, std::string>, std::string> matmul_operand_reuse; + std::map<std::tuple<const ir::Var*, int64_t, std::string>, std::string> matmul_operand_reuse; for (const auto& [tile_var, tile_type] : fs_.tile_var_allocs) { std::string type_str = GetTileBufTypeStringFromTileType(tile_type); auto memref = ir::GetDefinedMemRef(tile_type); const ir::Var* base_ptr = memref->base_.get(); + auto const_offset = As<ir::ConstInt>(memref->byte_offset_); + INTERNAL_CHECK_SPAN(const_offset != nullptr, tile_var->span_) + << "Expected static on-chip byte_offset for matmul operand reuse"; std::string ssa_name; auto reuse_key = GetStaticMatmulOperandReuseKey(tile_type, type_str); if (reuse_key.has_value() && fs_.tpop_result_vars.count(tile_var.get()) == 0) { - auto key = std::make_pair(base_ptr, *reuse_key); + auto key = std::make_tuple(base_ptr, const_offset->value_, *reuse_key); auto reuse_it = matmul_operand_reuse.find(key); if (reuse_it != matmul_operand_reuse.end()) { ssa_name = reuse_it->second; } else { ssa_name = NewNamedTemp(tile_var->name_hint_);

Based on learnings, this codebase uses MemRef::base_ as allocation identity while preserving per-view byte_offset_/size_ separately after memory reuse.

Also applies to: 718-720

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/codegen/pto/pto_codegen.cpp` around lines 244 - 267, The reuse key currently collapses distinct sub-views by using memref->base_ plus the tile signature (via GetStaticMatmulOperandReuseKey) but ignores per-view memref->byte_offset_, causing different views into the same allocation to be aliased; update the key used by matmul_operand_reuse to include memref->byte_offset_ (e.g., use std::make_pair(std::tuple(base_ptr, memref->byte_offset_), *reuse_key) or otherwise append byte_offset to the key) and also adjust the alloc_tile emission/suppression logic that skips emitting a second pto.alloc_tile when a reuse hit occurs (the code around the NewNamedTemp/ matmul_operand_reuse lookup and the later alloc_tile suppression) so that suppression only happens when base and byte_offset (and signature) truly match. Ensure you reference/modify GetStaticMatmulOperandReuseKey usage, matmul_operand_reuse, memref->base_, memref->byte_offset_, and the alloc_tile emission path to apply the fix.

github-project-automation Bot added this to pto project Apr 29, 2026

lwDavid self-assigned this Apr 29, 2026

lwDavid added the bug Something isn't working label Apr 29, 2026

lwDavid moved this to In Progress in pto project Apr 29, 2026

gemini-code-assist Bot reviewed Apr 29, 2026

View reviewed changes

coderabbitai Bot reviewed Apr 29, 2026

View reviewed changes

lwDavid force-pushed the issue-1213-matmul-acc-buffer branch 2 times, most recently from 93df1be to 5a1bf01 Compare April 29, 2026 09:03

coderabbitai Bot reviewed Apr 29, 2026

View reviewed changes

fix(codegen): Reuse shared matmul tile buffers

d4c6cf9

lwDavid force-pushed the issue-1213-matmul-acc-buffer branch from 5a1bf01 to d4c6cf9 Compare April 29, 2026 09:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(codegen): Reuse matmul acc buffer#1216

fix(codegen): Reuse matmul acc buffer#1216
lwDavid wants to merge 1 commit intohw-native-sys:mainfrom
lwDavid:issue-1213-matmul-acc-buffer

lwDavid commented Apr 29, 2026

Uh oh!

coderabbitai Bot commented Apr 29, 2026 •

edited

Loading

Reviews paused

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Apr 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

lwDavid commented Apr 29, 2026

Uh oh!

coderabbitai Bot commented Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

coderabbitai Bot commented Apr 29, 2026 •

edited

Loading