-
Notifications
You must be signed in to change notification settings - Fork 4
Description
Motivation
As the MIR semantics grows in complexity and test coverage expands (see #964), there is no systematic way to detect performance regressions or identify bottlenecks in symbolic execution. Issue #655 shows that performance problems have already surfaced (slow use functions), but we lack the tooling to investigate them systematically or to catch regressions automatically.
Proposed Feature
1. Benchmark Suite
Define a curated set of prove-rs test cases as a benchmark suite — a subset of the existing integration/data/prove-rs/ programs chosen to cover representative workloads (arithmetic, branching, enums, closures, iterators, etc.).
Each benchmark would record:
- Wall-clock time for
kmir prove-rs - Number of proof steps / rewrite rule applications (via K's
--statisticsoutput) - Peak memory usage
Results are stored as a JSON artifact (e.g., bench-results.json) committed or uploaded as a workflow artifact for comparison.
2. Python-level Profiling via pyk.testing.Profiler
pyk already ships a cProfile-based Profiler class (pyk/testing/_profiler.py). We can integrate it into the pytest integration-test runner to generate .prof files for selected tests, enabling:
- Identification of hot Python functions in the SMIR→K transformation pipeline (
kmir.py,smir.py) - Easy investigation of issues like [INVESTIGATE] Why use functions slow down the symbolic execution #655 without manual instrumentation
A --profile pytest option (or a dedicated make profile-integration target) would enable this on demand.
3. GitHub Actions Workflow: benchmark.yml
A new workflow triggered on:
pushtomaster— baseline trackingworkflow_dispatch— on-demand profiling for PRs under investigation- Optionally:
pull_requestwith a label likeperfto gate on demand
Steps:
- Build
stable-mir-json+kmir(reuse Docker setup fromtest.yml) - Run the benchmark suite, capturing timing/step counts
- Upload
bench-results.jsonas a workflow artifact - On
masterpush: compare against the previous baseline stored in a GitHub Actions cache and post a summary to the job summary ($GITHUB_STEP_SUMMARY) - Optionally: use
benchmark-action/github-action-benchmarkto track trends over time and comment on PRs when a regression threshold is exceeded
4. make benchmark Target
A local Makefile target for developers to run the benchmark suite locally and view a summary, mirroring CI behaviour.
Acceptance Criteria
- Benchmark suite defined (list of representative test cases with expected step counts)
-
make benchmarktarget runs locally and outputs a timing/step-count table -
--profilemode generates.proffiles for pytest integration tests usingpyk.testing.Profiler -
benchmark.ymlGitHub Actions workflow uploads benchmark artifacts on everymasterpush - CI posts a step-summary table comparing current vs. baseline results
- Documentation added to
docs/dev/on how to run benchmarks and interpret results
Related
- [INVESTIGATE] Why use functions slow down the symbolic execution #655 — slow
usefunctions investigation - Incrementally integrate external Rust test suites into mir-semantics CI #964 — external test suite integration (benchmark suite could overlap)
pyk/testing/_profiler.py— existing profiler infrastructure available in thepykdependency