[ADR] VF fusion PIPE_V barrier coarsening design#605
[ADR] VF fusion PIPE_V barrier coarsening design#605TaoTao-real wants to merge 1 commit intohw-native-sys:mainfrom
Conversation
Codex Review该评论由 review 机器人自动更新。
SummaryThe ADR's pairwise coarsening model can remove required same-pipe barriers, and its validation criteria are not aligned with the existing A5 Findings
The proposal reasons about
The ADR treats dense |
There was a problem hiding this comment.
Code Review
This pull request proposes an ADR to optimize PIPE_V barriers in VF fusion scenarios, reducing redundant synchronization in vector chains like FA and softmax through segment-level modeling and risk-based operator classification. Reviewer feedback suggests enhancing the design by addressing interactions with existing Gather/Scatter logic, providing hardware constraints for stateful operations, specifying analysis tools for alias detection, and defining heuristics for the proposed optimization levels.
| 集成策略: | ||
|
|
||
| 1. `InsertSyncAnalysis` 仍先建全量候选同步边(保持正确性基线)。 | ||
| 2. 在 `RemoveRedundantSync` 之前增加“VF 段精简”步骤: |
There was a problem hiding this comment.
The existing PTOInsertSync pass (line 127) disables redundancy elimination for kernels with Gather/Scatter operations due to correctness risks. Since the ADR proposes a new optimization step for PIPE_V barriers, it should explicitly address whether this new analysis will also be gated by hasGatherScatterLikeOps or how it intends to safely handle these high-risk operations where the current logic defaults to conservative behavior.
|
|
||
| 问题不是“完全移除 `PIPE_V` barrier”,而是区分: | ||
|
|
||
| 1. 必要 barrier:防止真实读写冲突、跨阶段可见性问题、特殊算子内部临时缓冲风险。 |
There was a problem hiding this comment.
| 1. `A/B` 在同一控制域、同一 VF 候选段内。 | ||
| 2. 分类组合在“可连续执行矩阵”中允许。 | ||
| 3. 依赖为 SSA 前向可见,不要求额外内存可见性屏障。 | ||
| 4. alias/slice 可证明无额外冲突。 |
| 建议新增 pass 选项(先默认关闭,灰度): | ||
|
|
||
| 1. `--enable-vf-barrier-coarsening` | ||
| 2. `--vf-barrier-coarsening-level=[safe|balanced|aggressive]` |
Summary
Add an architecture-level ADR for reducing redundant
pipe_barrier(PIPE_V)insertion in VF-fusible vector chains.Why
Current auto sync tends to insert PIPE_V barriers conservatively for same-pipe dependencies, which can over-serialize vector chains (e.g. FA/softmax style sequences) and leave performance on the table.
What is included
docs/designs/ptoas-vf-fusion-pipev-barrier-optimization-adr.mdScope
Follow-up
Implementation will be tracked in follow-up PRs with strict regression gates (issue428/454 + FA representative cases).