Skip to content

[ADR] InsertSync memory-phi/canonical-writer dedup architecture#604

Open
TaoTao-real wants to merge 1 commit intohw-native-sys:mainfrom
TaoTao-real:codex/adr-canonical-writer-memory-phi
Open

[ADR] InsertSync memory-phi/canonical-writer dedup architecture#604
TaoTao-real wants to merge 1 commit intohw-native-sys:mainfrom
TaoTao-real:codex/adr-canonical-writer-memory-phi

Conversation

@TaoTao-real
Copy link
Copy Markdown
Contributor

@TaoTao-real TaoTao-real commented Apr 29, 2026

Summary

This PR adds an ADR for architecture-level InsertSync improvement using Memory-Phi / Canonical-Writer modeling to deduplicate semantically equivalent sync chains in loop + if/else scenarios.

Why

Current path-intersection logic merges sync state only, but cannot merge semantically equivalent dependency edges created from branch-local physical writers. This can lead to duplicated loop-carried chains (including extra seed/drain) and inflated sync count.

What is included

  • New design doc: docs/designs/ptoas-insertsync-memory-phi-canonical-writer-design.md
  • Architecture decisions, data model, invariants, algorithm phases, validation gates, and risk controls

Scope

  • Docs-only ADR PR
  • No implementation changes in this PR

Follow-up implementation

Implementation will be tracked in follow-up PRs, gated by regression suites including #428/#454 and issue-style workloads (#226/#233/FA).

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a design document for a new Memory-Phi and Canonical-Writer architecture within the InsertSync layer to optimize synchronization chains in complex control flows. The review feedback highlights the need for better definitions of key terms like 'writer equivalence' and 'loop-carried candidates,' requests clarification on the scopeId field, points out inconsistencies in the SyncDedupKey definition across sections, and notes a typo in the document's date.

# PTOAS InsertSync: Memory-Phi / Canonical-Writer 去重合并架构设计(ADR)

- Status: Proposed
- Date: 2026-04-29
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The date 2026-04-29 appears to be in the future. If this is a typo, please correct it to the actual date of proposal to avoid confusion about the document's timeline.


建议新增或扩展如下类型:

1. `CanonicalMemKey { rootBuffer, groupId, depKind, scopeId }`
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The scopeId field in CanonicalMemKey could benefit from more explanation. Could you clarify what this ID represents? For instance, is it a unique identifier for a control flow scope (like a loop or branch) from the region tree mentioned in section 5.1? A more detailed description would help in understanding its role in the memory key.

2. `CanonicalWriterId { kind=Def|BranchPhi|LoopPhi, id }`
3. `MemoryPhiNode { kind, incomingDefs, mergeScopeId }`
4. `CanonicalDepEdge { producer, consumer, srcPipe, dstPipe, depKind, loopScopeId }`
5. `SyncDedupKey { producer, consumerAnchor, srcPipe, dstPipe, depKind, loopScopeId, slotClass }`
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

There appears to be a slight inconsistency in the definition of SyncDedupKey between sections.

  • Section 4.2 (line 59) describes the key with producerCanonicalWriter and loopScope.
  • Section 5.2 (line 94) defines it with producer and loopScopeId.

To improve clarity and ensure consistency, please unify these terms. For example, you could clarify if producer is the CanonicalWriterId and use either loopScope or loopScopeId consistently throughout the document.

1. 收集 `def/use` 与控制流作用域。
2. 按 `(rootBuffer, groupId)` 建立 `MemoryDef` 序列。
3. 在 `if/else` 汇合创建 `BranchPhi`。
4. 在循环头创建 `LoopPhi`(若存在 loop-carried 候选)。
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The document states that a LoopPhi is created "若存在 loop-carried 候选" (if a loop-carried candidate exists). This condition is a bit vague. Could you please elaborate on the criteria for identifying a "loop-carried candidate"? Providing more detail on this would make the algorithm clearer and reduce ambiguity during implementation.


缓解策略:

1. 合并前置条件必须包含控制域一致性、writer 等价性、alias 可证明性。
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

In the mitigation strategies, you mention "writer 等价性" (writer equivalence) as a precondition for merging. This term is ambiguous. Does it refer to writers being of the same operation type, writing to the same buffer, or something else? For example, section 1 mentions merging writes from tmatmul and tmatmul.acc. Clarifying what constitutes "equivalence" in this context is crucial for ensuring correctness.

@reedhecre
Copy link
Copy Markdown

Codex Review

该评论由 review 机器人自动更新。

Summary

ADR #604 has two correctness-level design gaps in the canonical dependency model and is missing a loop zero-trip regression gate in its validation plan.

Findings

  1. P1 Canonical writer model does not cover WAR and ACC read/read hazards docs/designs/ptoas-insertsync-memory-phi-canonical-writer-design.md:100

Lines 98-103 route every MemoryDependentAnalyzer candidate through CanonicalWriterAnalysis, and the core types only model CanonicalWriterId / CanonicalDepEdge { producer, consumer, ... }. The current pass still needs non-writer-source hazards: IsMemInfoHasDependency inserts syncs for def/use (WAR) and for the special cross-pipe ACC use/use case because those read/read sequences can fail on device. Those edges do not have a source writer, so the proposed schema cannot represent them. If implemented as written, this will either drop required syncs or miscanonicalize them through an unrelated writer.

  1. P1 CanonicalMemKey drops subview/range information and can merge unrelated slices docs/designs/ptoas-insertsync-memory-phi-canonical-writer-design.md:47

The ADR builds one logical def chain per (rootBuffer, groupId) and the key types only keep rootBuffer/groupId plus scope metadata (lines 47, 82, 90, 124). The existing alias model is range-sensitive: BaseMemInfo carries baseAddresses and allocateSize, and overlap checks use those byte ranges before treating two accesses as the same memory. The repo already has real same-root disjoint-slice patterns such as test/lit/pto/subview_tmatmul_acc_slices_a5.pto. Unless groupId is explicitly defined to encode offset/size-equivalence, BranchPhi/LoopPhi can merge unrelated slices of the same root buffer and dedup away syncs that are still required.

  1. P2 Validation plan omits the existing zero-trip loop regression gate docs/designs/ptoas-insertsync-memory-phi-canonical-writer-design.md:195

Section 11.1 only gates issue428 and issue454, but this ADR also rewrites loop-header/back-edge handling (LoopPhi, seed/drain, eventId decisions). The current implementation has dedicated zero-iteration correctness logic, and the repo already carries test/lit/pto/issue533_loop_zero_trip_sync_regression.pto to ensure post-loop waits are preserved when a loop executes zero times. Leaving that regression out of the must-pass list means a first-iteration/zero-trip regression in the new loop canonicalization can slip through CI.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants