Skip to content

[Sync] add syncFinder escape evidence tests#603

Open
TaoTao-real wants to merge 1 commit intohw-native-sys:mainfrom
TaoTao-real:codex/syncfinder-hidden-risk-repro
Open

[Sync] add syncFinder escape evidence tests#603
TaoTao-real wants to merge 1 commit intohw-native-sys:mainfrom
TaoTao-real:codex/syncfinder-hidden-risk-repro

Conversation

@TaoTao-real
Copy link
Copy Markdown
Contributor

Summary

Add focused lit tests that document why syncFinder must not escape may-path control-flow regions, and what evidence we have today from current main behavior.

This PR is intentionally explanation-only:

  • no implementation change
  • no behavior change
  • only debug-style regression/evidence tests

Why this matters

InsertSyncAnalysis uses two pieces of state during reverse scan:

  • alreadySync[pipe]
  • syncFinder[syncIndex]

For loop regions, current main carries syncFinder out of the loop body while intentionally not carrying alreadySync.
That is enough to be dangerous: an old wait inside a may-execute loop can still reactivate a matching top-level set later, rebuild alreadySync outside the loop, and then incorrectly eliminate a direct post-loop sync that is required when the loop zero-trips.

For scf.if without an explicit else, current frontend lowering materializes a virtual else region, so today we do not exercise the InsertBranchSync path that would propagate syncFinder from a true no-else region. The new branch test documents that evidence explicitly.

Added tests

1. issue_syncfinder_zero_trip_single_loop_debug.pto

Evidence for the single-loop zero-trip failure mode.

Shape:

  • top-level MTE2 -> V
  • loop-local old V -> MTE3
  • post-loop MTE3 consumer still directly depends on the original MTE2

Expected zero-trip-safe behavior:

  • keep a direct MTE2 -> MTE3 sync for the post-loop consumer

Observed on current main:

  • After Analysis contains MTE2 -> V and V -> MTE3
  • direct MTE2 -> MTE3 is missing

This is the minimal positive reproducer that shows loop syncFinder escape is functionally unsafe.

2. issue_syncfinder_zero_trip_nested_loop_debug.pto

Evidence that the same problem composes across nested loops.

Shape:

  • top-level MTE2 -> V
  • nested may-path old V -> MTE3
  • post-loop MTE3 consumer still directly depends on the original MTE2

Expected zero-trip-safe behavior:

  • keep a direct MTE2 -> MTE3 sync for the post-loop consumer

Observed on current main:

  • After Analysis still drops the direct MTE2 -> MTE3

This shows the problem is not limited to a single loop layer.

3. syncfinder_if_virtual_else_safe_debug.pto

Branch-side contrast case.

Shape:

  • top-level MTE2 -> V
  • scf.if then-branch consumes the V result with MTE3
  • post-if MTE3 consumer directly depends on the original MTE2

Observed on current main:

  • translator emits IF_BEGIN, ELSE_BEGIN, and a virtualElse placeholder even when source IR has no explicit else
  • After Analysis still keeps the direct MTE2 -> MTE3

This is important because it documents the current evidence for the branch case:

  • today, user-level scf.if without else does not become a true “then-only region” in SyncIR
  • so current main does not reproduce the loop-style bug on this path
  • if SyncIR ever starts representing a true no-else may-region and propagates syncFinder outward, that would need the same scrutiny as loop propagation

Validation

Ran these checks with current ptoas plus LLVM FileCheck:

  • issue_syncfinder_zero_trip_single_loop_debug.pto
  • issue_syncfinder_zero_trip_nested_loop_debug.pto
  • syncfinder_if_virtual_else_safe_debug.pto

All checks passed.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces several LIT test cases to reproduce synchronization issues in the syncFinder analysis, specifically focusing on zero-trip nested and single loops, as well as branch analysis involving virtual else blocks. The review feedback correctly identifies that the CHECK-NOT directives in the loop-related tests are misplaced. Due to the reverse-scan nature of the InsertSyncAnalysis, a pipe_barrier is processed before a wait_flag, meaning the barrier appears first in the debug output. The current placement of CHECK-NOT before the barrier check would fail to catch the presence of the wait_flag if the bug were fixed, rendering the test ineffective for regression testing.

Comment on lines +20 to +22
// CHECK: [ 7] COMPOUND pto.tstore [PIPE_MTE3]
// CHECK-NOT: wait_flag <PIPE_MTE2 -> PIPE_MTE3>
// CHECK: PRE : pipe_barrier <PIPE_MTE3 -> PIPE_MTE3>
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The CHECK-NOT directive is misplaced and will not correctly validate the absence of the synchronization flag if the bug is fixed.

In the InsertSyncAnalysis implementation, synchronization operations are added to the pipeBefore list in the order they are discovered during the reverse scan. The loop-carried dependency (which generates the pipe_barrier) is processed before the top-level dependencies (which would generate the wait_flag). Consequently, the pipe_barrier appears before any wait_flag in the debug output.

FileCheck's CHECK-NOT only ensures the pattern does not occur between the previous CHECK and the next CHECK. Currently, it only checks the gap between the COMPOUND line and the PRE : pipe_barrier line. If the bug were fixed and the wait_flag were correctly inserted, it would appear after the barrier, and this test would still pass (failing to catch the regression). Additionally, note that wait_flag <PIPE_V -> PIPE_MTE3> is also missing for this node due to the same escape issue.

// CHECK: [   7] COMPOUND pto.tstore [PIPE_MTE3]
// CHECK: PRE : pipe_barrier <PIPE_MTE3 -> PIPE_MTE3>
// CHECK-NOT: wait_flag <PIPE_MTE2 -> PIPE_MTE3>

Comment on lines +18 to +20
// CHECK: [ 5] COMPOUND pto.tstore [PIPE_MTE3]
// CHECK-NOT: wait_flag <PIPE_MTE2 -> PIPE_MTE3>
// CHECK: PRE : pipe_barrier <PIPE_MTE3 -> PIPE_MTE3>
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The CHECK-NOT directive is misplaced here as well. Since the pipe_barrier is discovered and inserted into the pipeBefore list before the wait_flag during the reverse analysis, it will appear first in the debug output. Moving the CHECK-NOT after the barrier check ensures that the test correctly fails if the wait_flag is present anywhere in the PRE block.

// CHECK: [   5] COMPOUND pto.tstore [PIPE_MTE3]
// CHECK: PRE : pipe_barrier <PIPE_MTE3 -> PIPE_MTE3>
// CHECK-NOT: wait_flag <PIPE_MTE2 -> PIPE_MTE3>

@reedhecre
Copy link
Copy Markdown

Codex Review

该评论由 review 机器人自动更新。

  • PR: [Sync] add syncFinder escape evidence tests #603 [Sync] add syncFinder escape evidence tests
  • Author: TaoTao-real
  • Base/Head: main / codex/syncfinder-hidden-risk-repro
  • Head SHA: 676ef493935d
  • Trigger: 检测到新的 open PR
  • Generated At: 2026-04-29T07:34:26Z
  • Status: completed

Summary

PR #603 adds two passing lit tests that codify the known zero-trip sync bug, so a real fix will look like a regression in CI.

Findings

  1. P2 Single-loop reproducer locks in the known incorrect sync result test/lit/pto/issue_syncfinder_zero_trip_single_loop_debug.pto:19

The comment above this check says the correct zero-trip behavior is to keep a direct MTE2->MTE3 sync for the post-loop tstore, but this assertion requires that sync to be absent. As a result, the test passes only while the known bug remains, and the real InsertLoopSync fix will show up as a CI regression until this test is removed or inverted.

  1. P2 Nested-loop reproducer also encodes the bug as expected output test/lit/pto/issue_syncfinder_zero_trip_nested_loop_debug.pto:21

This nested-loop variant has the same contract problem: the file documents that the post-loop tstore should keep a direct MTE2->MTE3 sync, but the check explicitly forbids it. Merging this as a normal passing lit test makes CI certify the current bug instead of guarding the intended behavior, so the eventual fix will be blocked by the test suite.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants