[ADR] InsertSync memory-phi/canonical-writer dedup architecture#604
[ADR] InsertSync memory-phi/canonical-writer dedup architecture#604TaoTao-real wants to merge 1 commit intohw-native-sys:mainfrom
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces a design document for a new Memory-Phi and Canonical-Writer architecture within the InsertSync layer to optimize synchronization chains in complex control flows. The review feedback highlights the need for better definitions of key terms like 'writer equivalence' and 'loop-carried candidates,' requests clarification on the scopeId field, points out inconsistencies in the SyncDedupKey definition across sections, and notes a typo in the document's date.
| # PTOAS InsertSync: Memory-Phi / Canonical-Writer 去重合并架构设计(ADR) | ||
|
|
||
| - Status: Proposed | ||
| - Date: 2026-04-29 |
|
|
||
| 建议新增或扩展如下类型: | ||
|
|
||
| 1. `CanonicalMemKey { rootBuffer, groupId, depKind, scopeId }` |
There was a problem hiding this comment.
The scopeId field in CanonicalMemKey could benefit from more explanation. Could you clarify what this ID represents? For instance, is it a unique identifier for a control flow scope (like a loop or branch) from the region tree mentioned in section 5.1? A more detailed description would help in understanding its role in the memory key.
| 2. `CanonicalWriterId { kind=Def|BranchPhi|LoopPhi, id }` | ||
| 3. `MemoryPhiNode { kind, incomingDefs, mergeScopeId }` | ||
| 4. `CanonicalDepEdge { producer, consumer, srcPipe, dstPipe, depKind, loopScopeId }` | ||
| 5. `SyncDedupKey { producer, consumerAnchor, srcPipe, dstPipe, depKind, loopScopeId, slotClass }` |
There was a problem hiding this comment.
There appears to be a slight inconsistency in the definition of SyncDedupKey between sections.
- Section 4.2 (line 59) describes the key with
producerCanonicalWriterandloopScope. - Section 5.2 (line 94) defines it with
producerandloopScopeId.
To improve clarity and ensure consistency, please unify these terms. For example, you could clarify if producer is the CanonicalWriterId and use either loopScope or loopScopeId consistently throughout the document.
| 1. 收集 `def/use` 与控制流作用域。 | ||
| 2. 按 `(rootBuffer, groupId)` 建立 `MemoryDef` 序列。 | ||
| 3. 在 `if/else` 汇合创建 `BranchPhi`。 | ||
| 4. 在循环头创建 `LoopPhi`(若存在 loop-carried 候选)。 |
There was a problem hiding this comment.
The document states that a LoopPhi is created "若存在 loop-carried 候选" (if a loop-carried candidate exists). This condition is a bit vague. Could you please elaborate on the criteria for identifying a "loop-carried candidate"? Providing more detail on this would make the algorithm clearer and reduce ambiguity during implementation.
|
|
||
| 缓解策略: | ||
|
|
||
| 1. 合并前置条件必须包含控制域一致性、writer 等价性、alias 可证明性。 |
There was a problem hiding this comment.
In the mitigation strategies, you mention "writer 等价性" (writer equivalence) as a precondition for merging. This term is ambiguous. Does it refer to writers being of the same operation type, writing to the same buffer, or something else? For example, section 1 mentions merging writes from tmatmul and tmatmul.acc. Clarifying what constitutes "equivalence" in this context is crucial for ensuring correctness.
Codex Review该评论由 review 机器人自动更新。
SummaryADR #604 has two correctness-level design gaps in the canonical dependency model and is missing a loop zero-trip regression gate in its validation plan. Findings
Lines 98-103 route every
The ADR builds one logical def chain per
Section 11.1 only gates |
Summary
This PR adds an ADR for architecture-level InsertSync improvement using Memory-Phi / Canonical-Writer modeling to deduplicate semantically equivalent sync chains in loop + if/else scenarios.
Why
Current path-intersection logic merges sync state only, but cannot merge semantically equivalent dependency edges created from branch-local physical writers. This can lead to duplicated loop-carried chains (including extra seed/drain) and inflated sync count.
What is included
docs/designs/ptoas-insertsync-memory-phi-canonical-writer-design.mdScope
Follow-up implementation
Implementation will be tracked in follow-up PRs, gated by regression suites including #428/#454 and issue-style workloads (#226/#233/FA).