Fix(spmd_paged_attention): per-slot sync events for K/V prefetch by chenshengxin2026 · Pull Request #708 · hw-native-sys/simpler

chenshengxin2026 · 2026-04-30T09:02:20Z

Summary

Fixes the intermittent precision failure in
tests/st/a2a3/tensormap_and_ringbuffer/spmd_paged_attention.

Use per-slot ping-pong events for QK/PV L1 (MTE1<->MTE2) and
L0 (MTE1<->M) so each ping-pong slot has its own RAW/WAR sync.
Drain TPUSH(sij) via PIPE_FIX -> PIPE_S before record() so
AIV TPOP(sij) only observes a fully written GM FIFO.
Move the next-block TLOAD in the QK step to after record() to
avoid reintroducing a coarse PIPE_ALL barrier.

Verification

task-submit --device auto --run "python -m pytest tests/st/a2a3/tensormap_and_ringbuffer/spmd_paged_attention --platform a2a3 --rounds 20"

Performance

After fix:

================================================================
  Performance Summary (tensormap_and_ringbuffer)
================================================================

  Example                                   Elapsed (us)    Sched (us)     Orch (us)
  ----------------------------------------  ------------  ------------  ------------
  spmd_paged_attention (Case1)                    1325.3        1325.3           5.8
  spmd_paged_attention (Case2)                     694.4         694.4           6.1

================================================================

Before fix:

================================================================
  Performance Summary (tensormap_and_ringbuffer)
================================================================

  Example                                   Elapsed (us)    Sched (us)     Orch (us)
  ----------------------------------------  ------------  ------------  ------------
  spmd_paged_attention (Case1)                    1316.5        1316.5           6.2
  spmd_paged_attention (Case2)                     694.4         694.4           6.1

================================================================
  Benchmark complete (tensormap_and_ringbuffer): 2 passed, 0 failed (2 total)
================================================================

Related: #704

gemini-code-assist

Code Review

This pull request implements ping-pong buffering and per-slot synchronization for the QK and PV steps in the paged attention kernel. Key changes include the introduction of event base constants, the use of tile arrays for double buffering, and the addition of explicit flag initialization and cleanup to manage pipeline dependencies. I have no feedback to provide as the existing review comments were purely explanatory or validating.

- Replace shared EVENT_ID0/EVENT_ID1 with per-slot events for QK/PV L1 (MTE1<->MTE2) and L0 (MTE1<->M) so each ping-pong slot has its own RAW/WAR sync. - Split QK/PV L0 left/right-tile addresses into two-entry arrays with disjoint offsets so the slot index selects an independent L0 region per iteration. - Add a dedicated PV_PIJ_EVENT for the TPOP(pij) -> TMOV(aTile_PV) path, decoupling pij synchronization from the V-load ping-pong. - Move the next-block K TLOAD in the QK step to after sij record() so the prefetch stays outside the C2V notification critical path. - Set and drain all eight per-slot events at function entry/exit to keep AIC pipeline state consistent across calls. Verification: task-submit --device auto --run "python -m pytest \ tests/st/a2a3/tensormap_and_ringbuffer/spmd_paged_attention \ --platform a2a3 --rounds 20"

gemini-code-assist Bot reviewed Apr 30, 2026

View reviewed changes

chenshengxin2026 force-pushed the feat/spmd-pa-aic-kv-prefetch branch 2 times, most recently from d2fba00 to 8b61747 Compare April 30, 2026 09:25

chenshengxin2026 force-pushed the feat/spmd-pa-aic-kv-prefetch branch from 8b61747 to 2f57f06 Compare April 30, 2026 09:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix(spmd_paged_attention): per-slot sync events for K/V prefetch#708

Fix(spmd_paged_attention): per-slot sync events for K/V prefetch#708
chenshengxin2026 wants to merge 1 commit intohw-native-sys:mainfrom
chenshengxin2026:feat/spmd-pa-aic-kv-prefetch

chenshengxin2026 commented Apr 30, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

chenshengxin2026 commented Apr 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Verification

Performance

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

chenshengxin2026 commented Apr 30, 2026 •

edited

Loading