Category
Technical Debt (cleanup, refactor)
Component
Other (please specify in description)
Description
This is a cross-cutting code health issue across SceneTest, common worker ABI, runtime structs, and platform diagnostics collectors.
Current user-facing profiling capability already includes three distinct features: perf swimlane export, tensor dump, and PMU. However, the front-end still uses profiling to mean perf only. --enable-profiling / enable_profiling drives perf snapshots, perf output directory handling, and swimlane conversion, while dump tensor and PMU are modeled as separate one-off flags. This makes the terminology inconsistent: profiling is the umbrella concept at the product level, but profiling in the current API/CLI effectively means perf.
Perf-specific plumbing also leaks into generic runtime layers. Generic worker/runtime ABI and runtime structs carry perf-named fields such as enable_profiling, perf_data_base, perf_records_addr, and enable_profiling_flag. By contrast, dump tensor and PMU are closer to platform-owned collectors. This makes the boundary between common runtime and platform diagnostics inconsistent, and perf ends up polluting runtime internals.
In addition, perf, dump tensor, and PMU duplicate a large amount of lifecycle logic: config propagation, feature-flag publication, per-core/per-thread buffer allocation, AICPU init, host-side collection/export, artifact naming, and cleanup. These paths should be normalized behind a shared diagnostics/profiling abstraction instead of evolving as three parallel implementations.
Observed at commit 89003b5fccf9160bb35c48779c8d20e938aa70dc.
Related: #510
Location
simpler_setup/scene_test.py:657-691
simpler_setup/scene_test.py:859-867
simpler_setup/scene_test.py:1156-1166
simpler_setup/scene_test.py:1223-1225
simpler_setup/scene_test.py:1288-1297
simpler_setup/scene_test.py:1394-1397
src/common/task_interface/chip_call_config.h:21-26
src/common/worker/pto_runtime_c_api.h:75-98
src/common/worker/chip_worker.cpp:245-248
src/common/hierarchical/worker_manager.cpp:168-178
src/a5/runtime/host_build_graph/runtime/runtime.h:104-118
src/a5/runtime/host_build_graph/runtime/runtime.h:211-213
src/a5/runtime/tensormap_and_ringbuffer/runtime/runtime.h:86-111
src/a5/runtime/tensormap_and_ringbuffer/runtime/runtime.h:179-187
src/a5/platform/src/host/performance_collector.cpp:57-157
src/a5/platform/src/host/tensor_dump_collector.cpp:45-156
src/a5/platform/src/aicpu/performance_collector_aicpu.cpp:40-118
src/a5/platform/src/aicpu/performance_collector_aicpu.cpp:132-181
src/a5/platform/src/aicpu/tensor_dump_aicpu.cpp:36-57
src/a2a3/platform/sim/host/device_runner.cpp:312-376
src/a2a3/platform/onboard/host/device_runner.cpp:522-603
docs/testing.md:73-117
docs/task-flow.md:30-32
docs/task-flow.md:185-190
docs/profiling-name-map.md:132-163
Proposed Fix
- Introduce a first-class umbrella config for diagnostics/profiling with explicit sub-features (
perf, dump_tensor, pmu) instead of overloading enable_profiling to mean perf only.
- At the CLI/API layer, make perf explicit. If backward compatibility is required, keep
--enable-profiling / enable_profiling only as a compatibility alias to the perf sub-feature and document the deprecation path.
- Move perf-specific state and memory layout ownership out of generic runtime naming. Generic runtime/common ABI should carry only feature-agnostic diagnostics hooks or flags; perf collector pointers and buffer layout should stay in platform diagnostics components, aligned with dump tensor and PMU.
- Extract shared lifecycle logic across perf, dump tensor, and PMU into reusable helpers or components: feature flag encoding/publication, collector init/finalize contract, host/device buffer allocation and copy-back pattern, artifact naming policy, and SceneTest post-processing/export hooks.
- Update docs so profiling is consistently the umbrella term and perf refers only to the swimlane/perf data path.
Priority
Medium (minor risk, should fix in next few releases)
Category
Technical Debt (cleanup, refactor)
Component
Other (please specify in description)
Description
This is a cross-cutting code health issue across SceneTest, common worker ABI, runtime structs, and platform diagnostics collectors.
Current user-facing profiling capability already includes three distinct features: perf swimlane export, tensor dump, and PMU. However, the front-end still uses
profilingto mean perf only.--enable-profiling/enable_profilingdrives perf snapshots, perf output directory handling, and swimlane conversion, while dump tensor and PMU are modeled as separate one-off flags. This makes the terminology inconsistent: profiling is the umbrella concept at the product level, but profiling in the current API/CLI effectively means perf.Perf-specific plumbing also leaks into generic runtime layers. Generic worker/runtime ABI and runtime structs carry perf-named fields such as
enable_profiling,perf_data_base,perf_records_addr, andenable_profiling_flag. By contrast, dump tensor and PMU are closer to platform-owned collectors. This makes the boundary between common runtime and platform diagnostics inconsistent, and perf ends up polluting runtime internals.In addition, perf, dump tensor, and PMU duplicate a large amount of lifecycle logic: config propagation, feature-flag publication, per-core/per-thread buffer allocation, AICPU init, host-side collection/export, artifact naming, and cleanup. These paths should be normalized behind a shared diagnostics/profiling abstraction instead of evolving as three parallel implementations.
Observed at commit
89003b5fccf9160bb35c48779c8d20e938aa70dc.Related: #510
Location
simpler_setup/scene_test.py:657-691simpler_setup/scene_test.py:859-867simpler_setup/scene_test.py:1156-1166simpler_setup/scene_test.py:1223-1225simpler_setup/scene_test.py:1288-1297simpler_setup/scene_test.py:1394-1397src/common/task_interface/chip_call_config.h:21-26src/common/worker/pto_runtime_c_api.h:75-98src/common/worker/chip_worker.cpp:245-248src/common/hierarchical/worker_manager.cpp:168-178src/a5/runtime/host_build_graph/runtime/runtime.h:104-118src/a5/runtime/host_build_graph/runtime/runtime.h:211-213src/a5/runtime/tensormap_and_ringbuffer/runtime/runtime.h:86-111src/a5/runtime/tensormap_and_ringbuffer/runtime/runtime.h:179-187src/a5/platform/src/host/performance_collector.cpp:57-157src/a5/platform/src/host/tensor_dump_collector.cpp:45-156src/a5/platform/src/aicpu/performance_collector_aicpu.cpp:40-118src/a5/platform/src/aicpu/performance_collector_aicpu.cpp:132-181src/a5/platform/src/aicpu/tensor_dump_aicpu.cpp:36-57src/a2a3/platform/sim/host/device_runner.cpp:312-376src/a2a3/platform/onboard/host/device_runner.cpp:522-603docs/testing.md:73-117docs/task-flow.md:30-32docs/task-flow.md:185-190docs/profiling-name-map.md:132-163Proposed Fix
perf,dump_tensor,pmu) instead of overloadingenable_profilingto mean perf only.--enable-profiling/enable_profilingonly as a compatibility alias to the perf sub-feature and document the deprecation path.Priority
Medium (minor risk, should fix in next few releases)