Skip to content

fix(ir): Rewrite lane1 replay loop state#1236

Open
lwDavid wants to merge 1 commit intohw-native-sys:mainfrom
lwDavid:fix/deepseek-nosplit-lane1-loop
Open

fix(ir): Rewrite lane1 replay loop state#1236
lwDavid wants to merge 1 commit intohw-native-sys:mainfrom
lwDavid:fix/deepseek-nosplit-lane1-loop

Conversation

@lwDavid
Copy link
Copy Markdown
Contributor

@lwDavid lwDavid commented Apr 30, 2026

Summary

Fix no-split dual-AIV lane1 replay so loop-carried tile state uses the zero-valid replay tiles instead of stale lane0 variables.

Add regression coverage for lane1 replay loops with accumulator init values.

Testing

  • cmake --build build --parallel
  • pytest tests/ut/ir/transforms/test_split_vector_kernel.py -v
  • ruff check/format --check
  • DeepSeek v3_2 representative compile repros on a2a3sim, a2a3, and a5

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 30, 2026

📝 Walkthrough

Walkthrough

Extended lane-1 replay reconstruction to explicitly rewrite loop metadata: rebuilds iter_args_ by substituting init values and recalculating types for TileType cases, rebuilds return_vars_ to reflect updated iter-arg types, and substitutes loop bounds (start_, stop_, step_, chunk_config_->size). Includes a unit test validating accumulator tile initialization in dual-dispatch AIV kernel lowering.

Changes

Cohort / File(s) Summary
Loop Metadata Rewriting for Lane-1 Replay
src/ir/transforms/split_vector_kernel_pass.cpp
Extended lane-1 replay to rebuild iter_args_ by substituting init values and recalculating types when expressions are TileType, rebuild return_vars_ reflecting updated types, and substitute loop bounds/conditions (ForStmt: start_, stop_, step_, chunk_config_->size; WhileStmt: condition_). Changes are localized to loop reconstruction logic within the replay pass.
Dual-Dispatch Lane-1 Lowering Validation
tests/ut/ir/transforms/test_split_vector_kernel.py
New unit test validating lane-1 path lowering for loops with accumulator tile init values, ensuring SSA value initialization via pl.tile.full with empty TileView(valid_shape=[0, 0]) and correct accumulator reference in pl.range(...) init_values.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested reviewers

  • lyfne123

Poem

🐰 Loop tiles nested, metadata rewritten with care,
Iter args bloom, return vars synchronized fair,
Lane-1 replayed in splendid array,
Accumulators initialized—hopping the vector way! 🌱

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 11.11% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title 'fix(ir): Rewrite lane1 replay loop state' directly matches the main change in the changeset, which explicitly rewrites loop metadata for lane1 replay reconstruction.
Description check ✅ Passed The description is directly related to the changeset, explaining the fix for lane1 replay loop state and the added regression test coverage.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
Review rate limit: 0/1 reviews remaining, refill in 60 minutes.

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request enhances the split_vector_kernel_pass by correctly rebuilding loop iteration arguments and return variables for lane 1 in dual-dispatch scenarios. It introduces helper functions to handle type updates for TileType and ensures that loop bounds, conditions, and chunk configurations are properly substituted. A new unit test verifies that lane 1 loops use empty accumulators as expected. Feedback was provided to add an internal check to verify the size consistency between return variables and iteration arguments in RebuildLane1LoopReturnVars to enforce IR invariants.

Comment on lines +845 to +848
std::vector<VarPtr> RebuildLane1LoopReturnVars(const std::vector<VarPtr>& return_vars,
const std::vector<IterArgPtr>& iter_args,
ExprReplacementMap& replacements) {
std::vector<VarPtr> new_return_vars;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

In RebuildLane1LoopReturnVars, it is recommended to add an INTERNAL_CHECK to verify that return_vars and iter_args have the same size. This enforces the IR invariant that loop-carried iteration arguments and their corresponding return variables must be 1-to-1, which helps catch potential issues in IR construction or previous transformation passes.

Suggested change
std::vector<VarPtr> RebuildLane1LoopReturnVars(const std::vector<VarPtr>& return_vars,
const std::vector<IterArgPtr>& iter_args,
ExprReplacementMap& replacements) {
std::vector<VarPtr> new_return_vars;
std::vector<VarPtr> RebuildLane1LoopReturnVars(const std::vector<VarPtr>& return_vars,
const std::vector<IterArgPtr>& iter_args,
ExprReplacementMap& replacements) {
INTERNAL_CHECK(return_vars.size() == iter_args.size())
<< "Internal error: return_vars and iter_args sizes must match";
std::vector<VarPtr> new_return_vars;
References
  1. Use INTERNAL_CHECK to validate internal invariants that are guaranteed by the design or preceding compiler passes, such as consistency between iter_args and return variables.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
tests/ut/ir/transforms/test_split_vector_kernel.py (1)

868-910: ⚡ Quick win

Add a companion while-loop replay regression.

This new test closes the for-loop case, but the C++ change also rewrites WhileStmt lane1 iter args / return vars / condition. A small while case with the same empty-accumulator pattern would keep that symmetric path from drifting unnoticed.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/ut/ir/transforms/test_split_vector_kernel.py` around lines 868 - 910,
Add a companion unit test mirroring
test_no_split_dual_dispatch_lane1_loop_init_uses_empty_accumulator but using a
while-loop to exercise the WhileStmt rewrite path: create a new test (e.g.,
test_no_split_dual_dispatch_lane1_while_init_uses_empty_accumulator) that builds
a pl.program where an accumulator tile (acc) is initialized empty, a while loop
consumes from aic and yields next_acc similarly to the for-loop case, then run
_run_split_vector_kernel(Before), extract lane1 from python_print(actual) and
assert the same patterns (tile.full with TileView valid_shape=[0, 0], pl.range
equivalents replaced by the while condition/iter-arg patterns, and the
incremented/store checks) so the WhileStmt handling is covered alongside the
ForStmt path.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@tests/ut/ir/transforms/test_split_vector_kernel.py`:
- Around line 868-910: Add a companion unit test mirroring
test_no_split_dual_dispatch_lane1_loop_init_uses_empty_accumulator but using a
while-loop to exercise the WhileStmt rewrite path: create a new test (e.g.,
test_no_split_dual_dispatch_lane1_while_init_uses_empty_accumulator) that builds
a pl.program where an accumulator tile (acc) is initialized empty, a while loop
consumes from aic and yields next_acc similarly to the for-loop case, then run
_run_split_vector_kernel(Before), extract lane1 from python_print(actual) and
assert the same patterns (tile.full with TileView valid_shape=[0, 0], pl.range
equivalents replaced by the while condition/iter-arg patterns, and the
incremented/store checks) so the WhileStmt handling is covered alongside the
ForStmt path.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 5cd192a4-166a-478b-9e17-3939509415fe

📥 Commits

Reviewing files that changed from the base of the PR and between 8a72fc3 and ff94ea3.

📒 Files selected for processing (2)
  • src/ir/transforms/split_vector_kernel_pass.cpp
  • tests/ut/ir/transforms/test_split_vector_kernel.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

1 participant