[ADR] VF fusion PIPE_V barrier coarsening design by TaoTao-real · Pull Request #605 · hw-native-sys/PTOAS

TaoTao-real · 2026-04-29T09:32:29Z

Summary

Add an architecture-level ADR for reducing redundant pipe_barrier(PIPE_V) insertion in VF-fusible vector chains.

Why

Current auto sync tends to insert PIPE_V barriers conservatively for same-pipe dependencies, which can over-serialize vector chains (e.g. FA/softmax style sequences) and leave performance on the table.

What is included

New design doc:
- docs/designs/ptoas-vf-fusion-pipev-barrier-optimization-adr.md
Proposed architecture:
- VF segment modeling
- risk-matrix based keep/drop decisions
- unknown-safe conservative fallback
- phased rollout and regression gates

Scope

Docs-only ADR PR
No code/behavior change in this PR

Follow-up

Implementation will be tracked in follow-up PRs with strict regression gates (issue428/454 + FA representative cases).

reedhecre · 2026-04-29T09:40:55Z

Codex Review

该评论由 review 机器人自动更新。

PR: [ADR] VF fusion PIPE_V barrier coarsening design #605 [ADR] VF fusion PIPE_V barrier coarsening design
Author: TaoTao-real
Base/Head: main / codex/adr-vf-pipev-barrier
Head SHA: 0a02c82cc99a
Trigger: 检测到新的 open PR
Generated At: 2026-04-29T09:40:54Z
Status: completed

Summary

The ADR's pairwise coarsening model can remove required same-pipe barriers, and its validation criteria are not aligned with the existing A5 PIPE_V barrier contract.

Findings

P1 Adjacent-pair VF segmentation can delete barriers that protect non-adjacent hazards docs/designs/ptoas-vf-fusion-pipev-barrier-optimization-adr.md:112

The proposal reasons about PIPE_V barriers only through adjacent A -> B pairs and segment boundaries. Current InsertSync is not local in that way: InsertSeqSync scans all earlier ops for each now op, and a same-pipe barrier before now records the original producer (lib/PTO/Transforms/InsertSync/InsertSyncAnalysis.cpp, around lines 153-176 and 350-356). A barrier before C can therefore represent an A -> C hazard even when the immediate predecessor B is unrelated. In a sequence like A: write ub0; B: write ub1; C: read ub0, both adjacent pairs look safe, but the barrier before C is still required. A segmenter that only decides on adjacent pairs would absorb C into the same segment and drop a required barrier, producing wrong results on same-pipe RAW/WAW chains. This also contradicts the current same-pipe contract exercised by test/samples/Sync/test_inject_sync_intra_pipe_barrier.pto.

P2 The acceptance criteria ignore that A5 already strips `PIPE_V` intra-pipe barriers docs/designs/ptoas-vf-fusion-pipev-barrier-optimization-adr.md:183

The ADR treats dense PIPE_V barriers as a general runtime problem and uses a drop in pipeVBarrierAfter as a success criterion. That is not true for A5 today: SyncCodegen::CreateBarrierOp already returns early for PIPE_V barriers on A5, and LoweringSyncToPipe erases any remaining PIPE_V barriers there. So InsertSync-side coarsening does not change emitted A5 kernels, while on A2/A3 the same barrier is still the correctness mechanism for same-pipe hazards. Without scoping the target architecture and whether the metric is counted in syncIR versus generated code/runtime, this ADR can appear successful without affecting the kernels it is trying to speed up, or encourage applying an A5-specific assumption to architectures where the barrier is still required.

gemini-code-assist

Code Review

This pull request proposes an ADR to optimize PIPE_V barriers in VF fusion scenarios, reducing redundant synchronization in vector chains like FA and softmax through segment-level modeling and risk-based operator classification. Reviewer feedback suggests enhancing the design by addressing interactions with existing Gather/Scatter logic, providing hardware constraints for stateful operations, specifying analysis tools for alias detection, and defining heuristics for the proposed optimization levels.

gemini-code-assist · 2026-04-29T09:41:41Z

+集成策略：
+
+1. `InsertSyncAnalysis` 仍先建全量候选同步边（保持正确性基线）。
+2. 在 `RemoveRedundantSync` 之前增加“VF 段精简”步骤：


The existing PTOInsertSync pass (line 127) disables redundancy elimination for kernels with Gather/Scatter operations due to correctness risks. Since the ADR proposes a new optimization step for PIPE_V barriers, it should explicitly address whether this new analysis will also be gated by hasGatherScatterLikeOps or how it intends to safely handle these high-risk operations where the current logic defaults to conservative behavior.

gemini-code-assist · 2026-04-29T09:41:41Z

+
+问题不是“完全移除 `PIPE_V` barrier”，而是区分：
+
+1. 必要 barrier：防止真实读写冲突、跨阶段可见性问题、特殊算子内部临时缓冲风险。


The ADR mentions "cross-stage visibility issues" as a reason for necessary barriers. Defining the specific hardware constraints (e.g., internal buffer synchronization or pipeline hazards) would help in creating a more precise VF_STATEFUL_OR_UNKNOWN classification.

gemini-code-assist · 2026-04-29T09:41:41Z

+1. `A/B` 在同一控制域、同一 VF 候选段内。
+2. 分类组合在“可连续执行矩阵”中允许。
+3. 依赖为 SSA 前向可见，不要求额外内存可见性屏障。
+4. alias/slice 可证明无额外冲突。


Clarify whether the "alias/slice" proof will leverage the existing MemoryDependentAnalyzer or if a specialized analysis for vector instruction offsets and strides is required to detect intra-buffer overlaps.

gemini-code-assist · 2026-04-29T09:41:41Z

+建议新增 pass 选项（先默认关闭，灰度）：
+
+1. `--enable-vf-barrier-coarsening`
+2. `--vf-barrier-coarsening-level=[safe|balanced|aggressive]`


The ADR should define the heuristics for balanced and aggressive levels. For example, will aggressive allow merging segments across certain control flow boundaries or relax the safety checks for REDUCTION operations?

docs: add ADR for VF fusion PIPE_V barrier coarsening

0a02c82

gemini-code-assist Bot reviewed Apr 29, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ADR] VF fusion PIPE_V barrier coarsening design#605

[ADR] VF fusion PIPE_V barrier coarsening design#605
TaoTao-real wants to merge 1 commit intohw-native-sys:mainfrom
TaoTao-real:codex/adr-vf-pipev-barrier

TaoTao-real commented Apr 29, 2026

Uh oh!

reedhecre commented Apr 29, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Apr 29, 2026

Uh oh!

gemini-code-assist Bot Apr 29, 2026

Uh oh!

gemini-code-assist Bot Apr 29, 2026

Uh oh!

gemini-code-assist Bot Apr 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants


		问题不是“完全移除 `PIPE_V` barrier”，而是区分：

		1. 必要 barrier：防止真实读写冲突、跨阶段可见性问题、特殊算子内部临时缓冲风险。

Conversation

TaoTao-real commented Apr 29, 2026

Summary

Why

What is included

Scope

Follow-up

Uh oh!

reedhecre commented Apr 29, 2026

Codex Review

Summary

Findings

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants