[WIP] Fix gm-pipe in spmd by yanghaoran29 · Pull Request #1224 · hw-native-sys/pypto

yanghaoran29 · 2026-04-29T07:53:14Z

No description provided.

coderabbitai · 2026-04-29T07:54:28Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

This PR introduces SPMD (Single Program Multiple Data) support for GM-pipe buffer tensor operations. It adds infrastructure to scale buffer allocation and unpacking based on SPMD core counts, extends the compiler's IR passes to detect and handle disjoint workspace sizing, implements a new codegen API for runtime tensor scaling, and includes example models and tests demonstrating the functionality.

Changes

Cohort / File(s)	Summary
CI and Test Configuration `.github/workflows/ci.yml`	Updated pytest invocation to include new SPMD GM-pipe runtime test module (`test_spmd_gm_pipe_buffer.py`) alongside existing tests.
Example Models `examples/models/10_down_proj_residual_spmd_gm_pipe.py`, `examples/models/__init__.py`	Added new executable example demonstrating down-projection with SPMD sharding and GM-pipe buffer verification; updated module aliases to expose `down_proj_residual_spmd_gm_pipe` and `paged_attention_spmd`.
Backend SPMD Support `python/pypto/backend/pto_backend.py`	Modified argument unpacking for SPMD GM-pipe buffer to compute typed GM pointers with per-core offset scaling; removed distributed (L3+) codegen path.
IR Pass Enhancements `src/ir/transforms/inject_gm_pipe_buffer_pass.cpp`	Updated buffer sizing logic to resolve effective SPMD core count and multiply `gm_buffer_bytes` by this factor when `spmd_core_num > 1` for proper per-core allocation.
Codegen API `include/pypto/codegen/codegen_base.h`	Added new virtual method `GetTensorCreateScaleExpr()` to codegen base class, enabling subclasses to provide per-tensor scale expressions for runtime allocation sizing.
Tile Operations `src/backend/common/pto_ops_common.cpp`	Updated `tile.slice` lowering to derive result buffer type via new helper and embed compile-time valid-row/col values when inferred type contains unknown dimensions.
Orchestration Codegen `src/codegen/orchestration/orchestration_codegen.cpp`, `src/codegen/tensor_op_codegen.cpp`	Implemented per-tensor scale expression caching in orchestration codegen (detecting SPMD function `core_num` from injected `gm_pipe_buffer` allocation sites); updated tensor-create codegen to apply scale factor to first-dimension shape expressions when available.
Runtime and Unit Tests `tests/st/runtime/test_spmd.py`, `tests/ut/codegen/test_orchestration_codegen.py`	Added SPMD golden runtime test for matmul-with-bias kernel across GM blocks and unit test validating orchestration codegen block-num and scaled gm_pipe_buffer allocation matching SPMD core count.

Sequence Diagram

sequenceDiagram
    participant Orchestration as Orchestration Layer
    participant Codegen as Codegen (IR→C++)
    participant IRPass as IR Pass
    participant Backend as Backend Unpacking
    participant Runtime as Runtime Execution

    Orchestration->>IRPass: detect gm_pipe_buffer<br/>tensor allocation
    IRPass->>IRPass: resolve SPMD core_num<br/>from function attrs
    IRPass->>IRPass: scale buffer bytes<br/>by core_num
    IRPass->>Codegen: inject scaled<br/>gm_pipe_buffer param
    
    Codegen->>Codegen: cache core_num<br/>as scale expression
    Codegen->>Codegen: emit tensor.create<br/>with scaled shape
    Codegen->>Backend: generate wrapper code<br/>with SPMD unpacking
    
    Backend->>Backend: compute GM pointer offset<br/>= buffer.addr + offset<br/>* block_idx / block_num
    Backend->>Runtime: emit C++ kernel wrapper
    
    Runtime->>Runtime: execute SPMD kernel<br/>over multiple cores
    Runtime->>Runtime: each core accesses<br/>scaled buffer slice

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Possibly related issues

[Bug] tile.slice codegen'd as pto.textract (copy), but slice should be zero-copy #996: Related through modifications to tile.slice codegen in src/backend/common/pto_ops_common.cpp for handling valid-shape dimensions.

Possibly related PRs

refactor(codegen): batch tensor.create into scope-level alloc_tensors #924: Modifies orchestration tensor-create batching flow in OrchestrationStmtCodegen, directly overlapping with this PR's per-tensor scale expression caching mechanism.
refactor(codegen): streamline orchestration code generation and remove unused tensor helper functions #862: Changes tensor-create codegen and orchestration allocation handling, sharing code-level modifications to tensor-op and orchestration codegen paths.
refactor(codegen): Lower tile.slice/assemble to pto.subview #1210: Updates tile-slice lowering logic in src/backend/common/pto_ops_common.cpp for result type and valid-shape computation.

Suggested reviewers

lyfne123

Poem

🐰 Hoppy hops and buffers scaled,
SPMD cores no longer failed!
Per-tile pointers dance with grace,
Orchestration finds its place. ✨
GM-pipe dreams now compile bright,
Tensors flow through sharded night!

🚥 Pre-merge checks | ✅ 2 | ❌ 3

❌ Failed checks (1 warning, 2 inconclusive)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 23.91% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check	❓ Inconclusive	The title is vague and uses a non-descriptive prefix '[WIP]', making it unclear what the core fix involves despite mentioning 'gm-pipe' and 'spmd'.	Replace '[WIP]' with a clearer, more specific title that describes the actual fix, such as 'Fix gm-pipe buffer scaling for SPMD partitioning' or similar.
Description check	❓ Inconclusive	No pull request description was provided by the author.	Add a detailed description explaining the problem being fixed, the solution approach, and how the changes achieve the stated objectives.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist

Code Review

This pull request introduces support for sharding the gm_pipe_buffer in SPMD mode to prevent memory overlap between logical blocks. Key changes include updating the InjectGMSlotBufferInPlace IR pass to scale buffer sizes by the number of SPMD cores and modifying the Python backend to generate C++ code that offsets the buffer based on the block index. Additionally, a new example and a system test were added to verify this behavior. Feedback indicates that the buffer scaling logic should be moved from the IR transformation pass to the codegen phase to align with project architectural rules.

gemini-code-assist · 2026-04-29T07:55:52Z

+  int64_t spmd_core_num = ResolveSpmdCoreNumFromCallers(needs_gm_param, callers, func_by_name);
+  if (spmd_core_num > 1) {
+    gm_buffer_bytes *= spmd_core_num;
+  }
+  int64_t gm_buffer_elems = (gm_buffer_bytes + 3) / 4;


The logic for scaling gm_buffer_bytes by spmd_core_num and calculating gm_buffer_elems should be removed from this IR transformation pass. While the current implementation attempts to address alignment issues, repository rules specify that the __gm_pipe_buffer parameter is wired to hardware allocations during codegen, not in the IR transformation pass. This pass should only be responsible for adding the parameter to function signatures.

References

The __gm_pipe_buffer parameter is wired to hardware allocations during codegen, not in the IR transformation pass. The pass only needs to add the parameter to function signatures.

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

python/pypto/backend/pto_backend.py (1)
894-1001: ⚠️ Potential issue | 🔴 Critical

Distributed execution will fail: codegen no longer emits required orchestration and task artifacts.

The generate() function no longer produces orchestration/host_orch.py and next_levels/{name}/ directories that distributed_runner.py requires. When DistributedCompiledProgram.__call__() invokes execute_distributed(), it will fail with FileNotFoundError at line 140 and line 128 of distributed_runner.py.

Current state:

generate() emits: orchestration/{name}.cpp (C++ only), kernel_config.py

distributed_runner.py expects: orchestration/host_orch.py (Python module), next_levels/{name}/ (per-function directories)

Any L3+ distributed program will crash at runtime. Either:

Restore Python orchestration codegen and next_levels artifact emission in generate(), OR

Update distributed_runner.py and DistributedCompiledProgram to work without these artifacts
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@python/pypto/backend/pto_backend.py` around lines 894 - 1001, The generate()
function no longer emits the Python orchestration module and per-function
next_levels artifacts required by distributed execution; restore emission inside
generate (after orchestration_codegen succeeds) to produce
orchestration/host_orch.py (containing the Python orchestration entrypoint used
by DistributedCompiledProgram) and create next_levels/{func_name}/ directories
with the expected per-kernel artifacts (e.g., MLIR/.pto and any wrapper files)
for each kernel in orch_func; update the block that currently writes
orchestration/{orch_func.name}.cpp and kernel_config.py (and associated
skip_ptoas logic) to also generate the host_orch.py content and create the
next_levels folders (using orch_result.func_name_to_id/func_name_to_core_type
and the existing units list to locate kernel artifacts) so distributed_runner.py
and DistributedCompiledProgram can find the files they expect.

🧹 Nitpick comments (1)

tests/ut/codegen/test_orchestration_codegen.py (1)

2283-2329: Add a dynamic-core_num variant for this regression test.

This test only locks the ConstInt path (pl.spmd(4)). Please add a core_num as scalar Var case to guard the dynamic launch-spec path and catch ordering/hoisting regressions around injected gm_pipe_buffer creates.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@tests/ut/codegen/test_orchestration_codegen.py` around lines 2283 - 2329, Add
a second sub-case in test_spmd_gm_pipe_buffer_tensor_create_scales_with_core_num
that exercises the dynamic launch-spec path by using a scalar Var for core_num
instead of the constant pl.spmd(4): create a scalar Var (e.g., core_num =
pl.Var(1, dtype=pl.INT32) or equivalent API used in tests), call with with
pl.spmd(core_num): inside SpmdGMPipeProgram.main, run the same passes and
codegen, and assert the generated code contains the gm_pipe_buffer tensor.create
shape scaled by the dynamic core_num (use a regex that matches multiplication by
core_num rather than the literal 4). Ensure the new case references the same
program/kernel (SpmdGMPipeProgram.kernel and .main) so it catches
ordering/hoisting regressions for injected gm_pipe_buffer creates.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/backend/common/pto_ops_common.cpp`:
- Around line 2213-2226: The current code force-replaces the unknown marker
"v_row=?, v_col=?" in result_type using rows_const/cols_const which can produce
incorrect static valid extents; instead, call
InferSubviewTileTypeComponents(...) to compute the proper tile-type components
(or leave the result_type dynamic) and use its output to update result_type only
when it indicates an explicit valid extent; update the branch that checks
result_type.empty() && valid_row.empty() to invoke
InferSubviewTileTypeComponents(result_type, /*other needed args*/) and apply its
returned valid_row/valid_col/type components rather than performing the string
replacement on result_type with rows_const/cols_const.

---

Outside diff comments:
In `@python/pypto/backend/pto_backend.py`:
- Around line 894-1001: The generate() function no longer emits the Python
orchestration module and per-function next_levels artifacts required by
distributed execution; restore emission inside generate (after
orchestration_codegen succeeds) to produce orchestration/host_orch.py
(containing the Python orchestration entrypoint used by
DistributedCompiledProgram) and create next_levels/{func_name}/ directories with
the expected per-kernel artifacts (e.g., MLIR/.pto and any wrapper files) for
each kernel in orch_func; update the block that currently writes
orchestration/{orch_func.name}.cpp and kernel_config.py (and associated
skip_ptoas logic) to also generate the host_orch.py content and create the
next_levels folders (using orch_result.func_name_to_id/func_name_to_core_type
and the existing units list to locate kernel artifacts) so distributed_runner.py
and DistributedCompiledProgram can find the files they expect.

---

Nitpick comments:
In `@tests/ut/codegen/test_orchestration_codegen.py`:
- Around line 2283-2329: Add a second sub-case in
test_spmd_gm_pipe_buffer_tensor_create_scales_with_core_num that exercises the
dynamic launch-spec path by using a scalar Var for core_num instead of the
constant pl.spmd(4): create a scalar Var (e.g., core_num = pl.Var(1,
dtype=pl.INT32) or equivalent API used in tests), call with with
pl.spmd(core_num): inside SpmdGMPipeProgram.main, run the same passes and
codegen, and assert the generated code contains the gm_pipe_buffer tensor.create
shape scaled by the dynamic core_num (use a regex that matches multiplication by
core_num rather than the literal 4). Ensure the new case references the same
program/kernel (SpmdGMPipeProgram.kernel and .main) so it catches
ordering/hoisting regressions for injected gm_pipe_buffer creates.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 896ec4d7-9cc1-4757-ba5e-b19f2304a115

📥 Commits

Reviewing files that changed from the base of the PR and between eeb7178 and ccc2e7d.

📒 Files selected for processing (11)

.github/workflows/ci.yml
examples/models/10_down_proj_residual_spmd_gm_pipe.py
examples/models/__init__.py
include/pypto/codegen/codegen_base.h
python/pypto/backend/pto_backend.py
src/backend/common/pto_ops_common.cpp
src/codegen/orchestration/orchestration_codegen.cpp
src/codegen/tensor_op_codegen.cpp
src/ir/transforms/inject_gm_pipe_buffer_pass.cpp
tests/st/runtime/test_spmd.py
tests/ut/codegen/test_orchestration_codegen.py

✅ Files skipped from review due to trivial changes (1)

examples/models/init.py

🚧 Files skipped from review as they are similar to previous changes (2)

.github/workflows/ci.yml
examples/models/10_down_proj_residual_spmd_gm_pipe.py

yanghaoran29 · 2026-04-30T03:39:37Z

SPMD `gm_pipe_buffer` Overlap Analysis and Fix Rationale

Background

The following down-projection pattern uses pl.spmd(...) and mixed-kernel
matmul/matmul_acc in one orchestration scope:

# Stage 7 & 8: Down projection + final residual writeback.
for db0 in pl.spmd((HIDDEN // DOWN_N_CHUNK) // 2, name_hint="down_proj_residual_spmd"):
    db = db0 * 2
    for di in pl.range(db, db + 2):
        d0 = di * DOWN_N_CHUNK
        mlp_chunk_0 = mlp_tile[:, 0 : DOWN_K_CHUNK]
        w_down_chunk_0 = w_down[0 : DOWN_K_CHUNK, d0 : d0 + DOWN_N_CHUNK]
        resid1_tile_chunk = resid1_tile[:, d0 : d0 + DOWN_N_CHUNK]
        down_acc = pl.matmul(mlp_chunk_0, w_down_chunk_0, out_dtype=pl.FP32)
        mlp_chunk_1 = mlp_tile[:, DOWN_K_CHUNK : 2 * DOWN_K_CHUNK]
        w_down_chunk_1 = w_down[DOWN_K_CHUNK : 2 * DOWN_K_CHUNK, d0 : d0 + DOWN_N_CHUNK]
        down_acc = pl.matmul_acc(down_acc, mlp_chunk_1, w_down_chunk_1)
        for ob in pl.pipeline(2, INTERMEDIATE // DOWN_K_CHUNK, stage=2):
            o0 = ob * DOWN_K_CHUNK
            down_mlp_chunk = mlp_tile[:, o0 : o0 + DOWN_K_CHUNK]
            w_down_chunk = w_down[o0 : o0 + DOWN_K_CHUNK, d0 : d0 + DOWN_N_CHUNK]
            down_acc = pl.matmul_acc(down_acc, down_mlp_chunk, w_down_chunk)
        out_chunk = pl.add(down_acc, resid1_tile_chunk)
        out = pl.assemble(out, pl.cast(out_chunk, target_type=pl.BF16), [0, d0])

For this pattern, codegen injects and uses __gm_pipe_buffer. The failure mode
appears only in SPMD form, while equivalent non-SPMD form can pass.

Why non-SPMD can pass while SPMD fails

Key runtime behavior from Simpler source (with direct code excerpts)

Allocation path: alloc_tensors enters runtime and performs real allocation

runtime/src/a2a3/runtime/tensormap_and_ringbuffer/orchestration/pto_orchestration_api.h:

static inline TaskOutputTensors alloc_tensors(const Arg &args) {
    PTO2Runtime *rt = pto2_current_runtime();
    if (rt->ops->is_fatal(rt)) {
        return TaskOutputTensors{};
    }
    return rt->ops->alloc_tensors(rt, args);
}

runtime/src/a2a3/runtime/tensormap_and_ringbuffer/runtime/pto_runtime2.cpp:

static TaskOutputTensors alloc_tensors_impl(PTO2Runtime *rt, const Arg &args) {
    return pto2_alloc_tensors(&rt->orchestrator, args);
}

runtime/src/a2a3/runtime/tensormap_and_ringbuffer/runtime/pto_orchestrator.cpp:

TaskOutputTensors pto2_alloc_tensors(PTO2OrchestratorState *orch, const Arg &args) {
    // ...
    PTO2OutputLayout layout = pto2_calculate_output_layout(args);
    PTO2PreparedTask prepared;
    if (!pto2_prepare_task(orch, args, layout.total_output_size, 0, &prepared)) {
        return TaskOutputTensors{};
    }

    PTO2TaskDescriptor &task = *prepared.task;
    PTO2TaskPayload &payload = *prepared.payload;
    // ...
    payload.init(args, outputs, prepared.alloc_result, layout, false);
    // ...
}

This shows alloc_tensors is not a frontend-only abstraction: it reaches runtime
allocation code and binds allocation results into task payload.

SPMD dispatch is one task split into many logical blocks

runtime/src/a2a3/runtime/tensormap_and_ringbuffer/runtime/scheduler/scheduler_dispatch.cpp:

void SchedulerContext::build_payload(
    PTO2DispatchPayload &dispatch_payload, PTO2TaskSlotState &slot_state, PTO2SubtaskSlot subslot,
    PTO2DeferredCompletionIngressBuffer *deferred_ingress
) {
    auto &payload = *slot_state.payload;
    int n = 0;
    for (int32_t i = 0; i < payload.tensor_count; i++) {
        dispatch_payload.args[n++] = reinterpret_cast<uint64_t>(&payload.tensors[i]);
    }
    // ...
    dispatch_payload.local_context.block_idx = slot_state.next_block_idx;
    dispatch_payload.local_context.block_num = slot_state.logical_block_num;
    // ...
}

Tensor argument addresses are reused for the task, while each dispatch updates
block_idx/block_num.

Kernel-side block identity is read from per-dispatch local context

runtime/src/a2a3/runtime/tensormap_and_ringbuffer/common/intrinsic.h:

struct LocalContext {
    int32_t block_idx;  // Logical block index within the task [0, block_num)
    int32_t block_num;  // How many logical blocks this task requires.
    // ...
};

static __aicore__ inline int32_t get_block_idx(__gm__ int64_t *args) {
    __gm__ LocalContext *ctx =
        reinterpret_cast<__gm__ LocalContext *>(static_cast<uint64_t>(args[SPMD_LOCAL_CONTEXT_INDEX]));
    return ctx->block_idx;
}

Different logical blocks therefore share one task-level argument layout but run
with different block_idx values from local context.

Consequence

In non-SPMD form (block_num=1), only one logical block uses the workspace at
a time, so overlap does not happen.
In SPMD form (block_num>1), if __gm_pipe_buffer is not both:
- allocated with enough total capacity for all logical blocks, and
- indexed to a per-block slice in kernel wrapper,
  then multiple logical blocks will read/write overlapping workspace regions.

What this change set fixes

Allocation-side scaling (host/orchestration side)
For injected gm pipe tensor.create, allocation shape is scaled by launch
core_num during orchestration codegen (instead of changing IR visible shape).
Wrapper-side sharding (kernel arg unpacking side)
For SPMD kernels, generated wrapper maps __gm_pipe_buffer pointer to:
base + block_idx * elems_per_block.
This guarantees each logical block gets a disjoint workspace slice.
Sample alignment with block-level sharding contract
The added ST sample uses SPMD by block index and does not combine
UP_DOWN split for this gm pipe case, because current sharding contract is
block-level (not lane-level subblock sharding).

Evidence from the added sample

Sample test:
tests/st/runtime/test_spmd.py::TestSPMDOperations::test_spmd_gm_pipe_buffer_golden

Before latest sample correction (historical run):
- result: FAIL
- symptom: golden mismatch (1024/2048 mismatched elements)
After latest commit and reinstall (current run):
- result: PASS
- command mode: task-submit + pytest ST

This delta is consistent with the overlap hypothesis and confirms the necessity
of:

per-block capacity guarantee at allocation time, and
per-block pointer sharding at kernel entry,
plus sample-side execution pattern aligned to current sharding semantics.

- Add GetTensorCreateScaleExpr; orchestration scales injected gm_pipe_buffer tensor.create by launch core_num; pto_backend shards __gm_pipe_buffer by block_idx. - Compute gm_buffer_elems after upward propagation in inject_gm_pipe_buffer. - Orchestration: skip inner scalar IVs absent from wrapper signature; map tuple outputs when SPMD wrappers prefix auxiliary tuple fields. - Reject non-bijective return permutations in normalize_return_order. - Add ST golden and UT coverage for SPMD gm_pipe_buffer path.

github-project-automation Bot added this to pto project Apr 29, 2026

yanghaoran29 changed the title ~~Fix spmd3~~ [WIP] Fix gm-pipe in spmd Apr 29, 2026

gemini-code-assist Bot reviewed Apr 29, 2026

View reviewed changes

yanghaoran29 force-pushed the fix-spmd3 branch from eeb7178 to ccc2e7d Compare April 29, 2026 09:22

coderabbitai Bot reviewed Apr 29, 2026

View reviewed changes

Comment thread src/backend/common/pto_ops_common.cpp Outdated

yanghaoran29 force-pushed the fix-spmd3 branch 2 times, most recently from 12e017c to fa52997 Compare April 30, 2026 03:23

yanghaoran29 force-pushed the fix-spmd3 branch from fa52997 to 46dce2e Compare April 30, 2026 03:57

yanghaoran29 force-pushed the fix-spmd3 branch from 46dce2e to 35ef2fc Compare April 30, 2026 03:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Fix gm-pipe in spmd#1224

[WIP] Fix gm-pipe in spmd#1224
yanghaoran29 wants to merge 1 commit intohw-native-sys:mainfrom
yanghaoran29:fix-spmd3

yanghaoran29 commented Apr 29, 2026

Uh oh!

coderabbitai Bot commented Apr 29, 2026 •

edited

Loading

Reviews paused

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Possibly related issues

Possibly related PRs

Suggested reviewers

Poem

❌ Failed checks (1 warning, 2 inconclusive)

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Apr 29, 2026

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

yanghaoran29 commented Apr 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

yanghaoran29 commented Apr 29, 2026

Uh oh!

coderabbitai Bot commented Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Possibly related issues

Possibly related PRs

Suggested reviewers

Poem

❌ Failed checks (1 warning, 2 inconclusive)

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

yanghaoran29 commented Apr 30, 2026

SPMD gm_pipe_buffer Overlap Analysis and Fix Rationale

Background

Why non-SPMD can pass while SPMD fails

Key runtime behavior from Simpler source (with direct code excerpts)

Consequence

What this change set fixes

Evidence from the added sample

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

coderabbitai Bot commented Apr 29, 2026 •

edited

Loading

SPMD `gm_pipe_buffer` Overlap Analysis and Fix Rationale