fix(codegen): Reuse matmul acc buffer#1216
fix(codegen): Reuse matmul acc buffer#1216lwDavid wants to merge 1 commit intohw-native-sys:mainfrom
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughEnsure in-place accumulator codegen uses the accumulator operand's tile-buffer SSA and type for Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request modifies the PTO codegen to handle in-place accumulation operations (matmul_acc and gemv_acc) by reusing the accumulator's SSA value for both input and output. It updates make_acc_codegen to bind the result to the input accumulator's buffer and modifies the assignment visitor to skip redundant tile allocations for these operations. A new unit test verifies that loop-carried accumulators correctly share buffers in the generated MLIR. I have no feedback to provide.
There was a problem hiding this comment.
🧹 Nitpick comments (1)
tests/ut/codegen/test_pto_codegen.py (1)
1565-1605: Please add thetile.gemv_accsibling case too.This test locks down the
tile.matmul_accfix well, but the production change also updatestile.gemv_accthrough the same custom codegen and alloc-suppression path. A small parametrized variant here would keep that sibling path from regressing silently.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/ut/codegen/test_pto_codegen.py` around lines 1565 - 1605, Extend the existing test_pto_codegen_matmul_acc_uses_loop_carried_accumulator_buffer to also cover the sibling gemv path: add a parametrized variant (or a second similar test) that uses pl.gemv_acc instead of pl.matmul_acc (and initial accum via pl.gemv or matching matvec op), then generate MLIR and assert the accumulator buffer is loop-carried and used by the final store by searching for "pto.tgemv.acc" (parallel to the existing "pto.tmatmul.acc") and verifying the same ins/outs operand identity checks as done for matmul_acc; update the MLIR line-match regexes to look for "pto.tgemv.acc" and the store accordingly so the gemv_acc codegen path is locked down.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@tests/ut/codegen/test_pto_codegen.py`:
- Around line 1565-1605: Extend the existing
test_pto_codegen_matmul_acc_uses_loop_carried_accumulator_buffer to also cover
the sibling gemv path: add a parametrized variant (or a second similar test)
that uses pl.gemv_acc instead of pl.matmul_acc (and initial accum via pl.gemv or
matching matvec op), then generate MLIR and assert the accumulator buffer is
loop-carried and used by the final store by searching for "pto.tgemv.acc"
(parallel to the existing "pto.tmatmul.acc") and verifying the same ins/outs
operand identity checks as done for matmul_acc; update the MLIR line-match
regexes to look for "pto.tgemv.acc" and the store accordingly so the gemv_acc
codegen path is locked down.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 899cae86-5ec2-4643-829a-9fcf7fe00523
📒 Files selected for processing (3)
src/backend/common/pto_ops_common.cppsrc/codegen/pto/pto_codegen.cpptests/ut/codegen/test_pto_codegen.py
93df1be to
5a1bf01
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
src/codegen/pto/pto_codegen.cpp (1)
69-71: Add coverage fortile.gemv_accas well.This helper opts
tile.gemv_accinto the same in-place accumulator path astile.matmul_acc, but the new regressions only assert the matmul branch. A small MLIR test that checkspto.tgemv.acckeepsins(acc)andouts(acc)on the same SSA would keep the sibling path from drifting.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/codegen/pto/pto_codegen.cpp` around lines 69 - 71, The helper IsInPlaceAccumulatorCall currently only returns true for "tile.matmul_acc" but the review requests opting "tile.gemv_acc" into the same in-place accumulator path; update IsInPlaceAccumulatorCall to check for both "tile.matmul_acc" and "tile.gemv_acc" (i.e., include a second equality check against "tile.gemv_acc") so the gemv branch follows the same in-place accumulator logic as matmul; ensure the function still guards for null call and op_ like the existing code.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/codegen/pto/pto_codegen.cpp`:
- Around line 244-267: The reuse key currently collapses distinct sub-views by
using memref->base_ plus the tile signature (via GetStaticMatmulOperandReuseKey)
but ignores per-view memref->byte_offset_, causing different views into the same
allocation to be aliased; update the key used by matmul_operand_reuse to include
memref->byte_offset_ (e.g., use std::make_pair(std::tuple(base_ptr,
memref->byte_offset_), *reuse_key) or otherwise append byte_offset to the key)
and also adjust the alloc_tile emission/suppression logic that skips emitting a
second pto.alloc_tile when a reuse hit occurs (the code around the NewNamedTemp/
matmul_operand_reuse lookup and the later alloc_tile suppression) so that
suppression only happens when base and byte_offset (and signature) truly match.
Ensure you reference/modify GetStaticMatmulOperandReuseKey usage,
matmul_operand_reuse, memref->base_, memref->byte_offset_, and the alloc_tile
emission path to apply the fix.
---
Nitpick comments:
In `@src/codegen/pto/pto_codegen.cpp`:
- Around line 69-71: The helper IsInPlaceAccumulatorCall currently only returns
true for "tile.matmul_acc" but the review requests opting "tile.gemv_acc" into
the same in-place accumulator path; update IsInPlaceAccumulatorCall to check for
both "tile.matmul_acc" and "tile.gemv_acc" (i.e., include a second equality
check against "tile.gemv_acc") so the gemv branch follows the same in-place
accumulator logic as matmul; ensure the function still guards for null call and
op_ like the existing code.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 506e27c2-395c-402f-9e73-2689f9bca268
📒 Files selected for processing (5)
include/pypto/codegen/pto/pto_codegen.hsrc/backend/common/pto_ops_common.cppsrc/codegen/pto/pto_codegen.cpptests/st/runtime/test_matmul.pytests/ut/codegen/test_pto_codegen.py
🚧 Files skipped from review as they are similar to previous changes (1)
- src/backend/common/pto_ops_common.cpp
| std::map<std::pair<const ir::Var*, std::string>, std::string> matmul_operand_reuse; | ||
| for (const auto& [tile_var, tile_type] : fs_.tile_var_allocs) { | ||
| std::string ssa_name = NewNamedTemp(tile_var->name_hint_); | ||
| BindVarToMlir(tile_var, ssa_name); | ||
|
|
||
| // Pre-populate type so body visitors (e.g., tile.reshape no-op check) | ||
| // can query it before per-variable alloc_tile emission runs. Tile types | ||
| // are always emitted with `v_row=?, v_col=?`; the actual extents flow | ||
| // through the alloc_tile valid_row/valid_col operands. | ||
| std::string type_str = GetTileBufTypeStringFromTileType(tile_type); | ||
| fs_.ssa_to_tile_buf_type[ssa_name] = type_str; | ||
|
|
||
| auto memref = ir::GetDefinedMemRef(tile_type); | ||
| const ir::Var* base_ptr = memref->base_.get(); | ||
|
|
||
| std::string ssa_name; | ||
| auto reuse_key = GetStaticMatmulOperandReuseKey(tile_type, type_str); | ||
| if (reuse_key.has_value() && fs_.tpop_result_vars.count(tile_var.get()) == 0) { | ||
| auto key = std::make_pair(base_ptr, *reuse_key); | ||
| auto reuse_it = matmul_operand_reuse.find(key); | ||
| if (reuse_it != matmul_operand_reuse.end()) { | ||
| ssa_name = reuse_it->second; | ||
| } else { | ||
| ssa_name = NewNamedTemp(tile_var->name_hint_); | ||
| matmul_operand_reuse.emplace(std::move(key), ssa_name); | ||
| } | ||
| } else { | ||
| ssa_name = NewNamedTemp(tile_var->name_hint_); | ||
| } |
There was a problem hiding this comment.
Reuse key collapses distinct sub-buffers that only share the same allocation base.
Line 252 keys matmul_operand_reuse by memref->base_ plus the tile signature, but not by memref->byte_offset_. In this IR model, multiple views can intentionally share the same base_ while still pointing at different regions. Once that happens, Lines 718-720 suppress the second pto.alloc_tile, so both operands end up bound to the first address and later matmul/gemv ops can read the wrong L0A/L0B buffer.
💡 Suggested fix
- std::map<std::pair<const ir::Var*, std::string>, std::string> matmul_operand_reuse;
+ std::map<std::tuple<const ir::Var*, int64_t, std::string>, std::string> matmul_operand_reuse;
for (const auto& [tile_var, tile_type] : fs_.tile_var_allocs) {
std::string type_str = GetTileBufTypeStringFromTileType(tile_type);
auto memref = ir::GetDefinedMemRef(tile_type);
const ir::Var* base_ptr = memref->base_.get();
+ auto const_offset = As<ir::ConstInt>(memref->byte_offset_);
+ INTERNAL_CHECK_SPAN(const_offset != nullptr, tile_var->span_)
+ << "Expected static on-chip byte_offset for matmul operand reuse";
std::string ssa_name;
auto reuse_key = GetStaticMatmulOperandReuseKey(tile_type, type_str);
if (reuse_key.has_value() && fs_.tpop_result_vars.count(tile_var.get()) == 0) {
- auto key = std::make_pair(base_ptr, *reuse_key);
+ auto key = std::make_tuple(base_ptr, const_offset->value_, *reuse_key);
auto reuse_it = matmul_operand_reuse.find(key);
if (reuse_it != matmul_operand_reuse.end()) {
ssa_name = reuse_it->second;
} else {
ssa_name = NewNamedTemp(tile_var->name_hint_);Based on learnings, this codebase uses MemRef::base_ as allocation identity while preserving per-view byte_offset_/size_ separately after memory reuse.
Also applies to: 718-720
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/codegen/pto/pto_codegen.cpp` around lines 244 - 267, The reuse key
currently collapses distinct sub-views by using memref->base_ plus the tile
signature (via GetStaticMatmulOperandReuseKey) but ignores per-view
memref->byte_offset_, causing different views into the same allocation to be
aliased; update the key used by matmul_operand_reuse to include
memref->byte_offset_ (e.g., use std::make_pair(std::tuple(base_ptr,
memref->byte_offset_), *reuse_key) or otherwise append byte_offset to the key)
and also adjust the alloc_tile emission/suppression logic that skips emitting a
second pto.alloc_tile when a reuse hit occurs (the code around the NewNamedTemp/
matmul_operand_reuse lookup and the later alloc_tile suppression) so that
suppression only happens when base and byte_offset (and signature) truly match.
Ensure you reference/modify GetStaticMatmulOperandReuseKey usage,
matmul_operand_reuse, memref->base_, memref->byte_offset_, and the alloc_tile
emission path to apply the fix.
5a1bf01 to
d4c6cf9
Compare
Fixes #1213
Lower in-place tile accumulator ops with the accumulator SSA as both ins and outs, and keep the assignment result bound to that same buffer so the final store reads the updated accumulator.