Fix issue593 by reserving auto reserve_buffer capacity in PlanMemory#601
Fix issue593 by reserving auto reserve_buffer capacity in PlanMemory#601HecreReed wants to merge 1 commit intohw-native-sys:mainfrom
Conversation
There was a problem hiding this comment.
Code Review
This pull request implements support for fixed-address alloc_tile operations in the PlanMemory pass for levels 1 and 2, allowing constant addresses to be reserved before automatic planning. The changes include new pre-planning logic, validation in ptoas, and updated liveness analysis. Feedback was provided to ensure that memory size reporting correctly calculates the maximum extent across all entries when only fixed-address tiles are present.
| if (autoEntries.empty()) { | ||
| failApplyBufferInfo[rootStorageEntry->bufInfo->bufferScope] = | ||
| rootStorageEntry->bitsOffset + rootStorageEntry->alignedConstBits; | ||
| return; | ||
| } |
There was a problem hiding this comment.
When autoEntries is empty, all entries in the scope have fixed addresses. The current logic only considers rootStorageEntry when calculating the required memory size for error reporting. It should instead iterate over all scopeEntries to find the maximum extent (offset + size) to provide an accurate report.
if (autoEntries.empty()) {
uint64_t maxAllocBits = 0;
for (StorageEntry *entry : scopeEntries) {
maxAllocBits = std::max(maxAllocBits, entry->bitsOffset + entry->alignedConstBits);
}
failApplyBufferInfo[rootStorageEntry->bufInfo->bufferScope] = maxAllocBits;
return;
}
Codex Review该评论由 review 机器人自动更新。
SummaryPR #601 引入了一个 reserve_buffer 容量计费回归:对非对齐 size 的 auto reserve 预扣过多容量,可能把本来合法的输入错误判成溢出。 Findings
这里把每个 auto |
be17f17 to
8505472
Compare
8505472 to
f29a40a
Compare
Summary
pto.reserve_buffer auto=truecapacity beforePlanMemorydecides whether local buffers can be placed without reuseissue593regression that proves a vecreserve_buffernow forces automatic UB reuse and still passes with--enable-insert-syncWhy this fixes issue593
Issue #593 is not about enlarging UB capacity. The failure happens because
PlanMemorycurrently plans ordinary local buffers against the full local-memory budget, often choosing the no-reuse path, and only afterwards tries to placereserve_buffer auto=true.In FlashAttention-style cases, that leaves no hole for the reserved FIFO even though reusing dead tiles would have made the program fit.
This change keeps the existing reuse algorithm, but makes auto reserve buffers participate in capacity accounting up front:
That gives Level1/Level2 code the same automatic UB reuse opportunity that users previously had to spell out manually with Level3 explicit addresses.
Fixes #593.
Validation
cmake --build build --target ptoas -j8build/tools/ptoas/ptoas /tmp/issue593_min_repro.ptobuild/tools/ptoas/ptoas --enable-insert-sync test/lit/pto/issue593_auto_reserve_triggers_reuse.ptopython3.9 llvm-lit -sv --filter issue593_auto_reserve_triggers_reuse build/test/litorigin/main, where/tmp/issue593_min_repro.ptofails witherror: 'pto.reserve_buffer' op failed to allocate local memory hole for reserve_buffer