Refactor: unify PMU/L2Perf/TensorDump collectors on shared profiling framework by ChaoZheng109 · Pull Request #705 · hw-native-sys/simpler

ChaoZheng109 · 2026-04-30T03:12:38Z

Summary

Introduce src/a2a3/platform/include/host/profiling_common/ with ProfilerBase<Derived, Module> (CRTP-based mgmt + collector thread orchestration) and BufferPoolManager (pre-registered device buffer pool, dev↔host pointer mapping).
Rewrite PmuCollector, L2PerfCollector, and TensorDumpCollector on top of the shared framework, collapsing three near-identical control flows into one and dropping ~2000 lines of duplicated .cpp code.
Reorganize profiling docs: move pmu-profiling.md from src/{a2a3,a5}/docs/ to top-level docs/, add profiling-framework.md and l2-swimlane-profiling.md, refresh tensor-dump.md, and update profiling-name-map.md / runtimes.md / testing.md to point at the new locations and the per-case output_prefix layout.

Testing

Simulation tests pass
Hardware tests pass

gemini-code-assist

Code Review

This pull request unifies the host-side infrastructure for PMU, L2 Swimlane, and Tensor Dump profiling into a shared framework, significantly reducing code duplication and improving maintainability across the a2a3 and a5 architectures. Key enhancements include a new three-bucket counter accounting model for better loss diagnostics, improved memory management via a centralized buffer pool, and the addition of completion barriers to ensure data consistency in tensor dumps. Feedback from the review suggests optimizing memory barriers in the SPSC queue logic to avoid redundancy and improve performance on weak-ordering architectures, as well as increasing the frequency of progress updates during the final data export phase.

…framework Introduce src/a2a3/platform/include/host/profiling_common/ with ProfilerBase<Derived, Module> (CRTP-based mgmt + collector thread orchestration) and BufferPoolManager (pre-registered device buffer pool, dev<->host pointer mapping). Rewrite PmuCollector, L2PerfCollector, and TensorDumpCollector on top of it, collapsing three near-identical control flows into one and shedding ~2000 lines of duplication across the .cpp files. Reorganize profiling docs to match the now-shared framework: move pmu-profiling.md out of src/{a2a3,a5}/docs/ to top-level docs/, add profiling-framework.md and l2-swimlane-profiling.md, refresh tensor-dump.md, and update profiling-name-map.md / runtimes.md / testing.md to point at the new locations and the per-case output_prefix layout.

gemini-code-assist Bot reviewed Apr 30, 2026

View reviewed changes

Comment thread src/a2a3/platform/include/host/profiling_common/profiler_base.h Outdated

Comment thread src/a2a3/platform/include/host/profiling_common/profiler_base.h Outdated

Comment thread src/a2a3/platform/src/host/tensor_dump_collector.cpp Outdated

ChaoZheng109 force-pushed the a2a3/profiling branch 3 times, most recently from 9db7c83 to 67fee48 Compare April 30, 2026 03:43

ChaoZheng109 force-pushed the a2a3/profiling branch from 67fee48 to 35ef482 Compare April 30, 2026 05:48

ChaoZheng109 mentioned this pull request Apr 30, 2026

[Code Health] Unify profiling abstractions across perf, dump tensor, and PMU #641

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor: unify PMU/L2Perf/TensorDump collectors on shared profiling framework#705

Refactor: unify PMU/L2Perf/TensorDump collectors on shared profiling framework#705
ChaoZheng109 wants to merge 1 commit intohw-native-sys:mainfrom
ChaoZheng109:a2a3/profiling

ChaoZheng109 commented Apr 30, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ChaoZheng109 commented Apr 30, 2026

Summary

Testing

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant