Skip to content

[FEAT] Add performance benchmarking and profiling infrastructure #970

@Stevengre

Description

@Stevengre

Motivation

As the MIR semantics grows in complexity and test coverage expands (see #964), there is no systematic way to detect performance regressions or identify bottlenecks in symbolic execution. Issue #655 shows that performance problems have already surfaced (slow use functions), but we lack the tooling to investigate them systematically or to catch regressions automatically.

Proposed Feature

1. Benchmark Suite

Define a curated set of prove-rs test cases as a benchmark suite — a subset of the existing integration/data/prove-rs/ programs chosen to cover representative workloads (arithmetic, branching, enums, closures, iterators, etc.).

Each benchmark would record:

  • Wall-clock time for kmir prove-rs
  • Number of proof steps / rewrite rule applications (via K's --statistics output)
  • Peak memory usage

Results are stored as a JSON artifact (e.g., bench-results.json) committed or uploaded as a workflow artifact for comparison.

2. Python-level Profiling via pyk.testing.Profiler

pyk already ships a cProfile-based Profiler class (pyk/testing/_profiler.py). We can integrate it into the pytest integration-test runner to generate .prof files for selected tests, enabling:

A --profile pytest option (or a dedicated make profile-integration target) would enable this on demand.

3. GitHub Actions Workflow: benchmark.yml

A new workflow triggered on:

  • push to master — baseline tracking
  • workflow_dispatch — on-demand profiling for PRs under investigation
  • Optionally: pull_request with a label like perf to gate on demand

Steps:

  1. Build stable-mir-json + kmir (reuse Docker setup from test.yml)
  2. Run the benchmark suite, capturing timing/step counts
  3. Upload bench-results.json as a workflow artifact
  4. On master push: compare against the previous baseline stored in a GitHub Actions cache and post a summary to the job summary ($GITHUB_STEP_SUMMARY)
  5. Optionally: use benchmark-action/github-action-benchmark to track trends over time and comment on PRs when a regression threshold is exceeded

4. make benchmark Target

A local Makefile target for developers to run the benchmark suite locally and view a summary, mirroring CI behaviour.

Acceptance Criteria

  • Benchmark suite defined (list of representative test cases with expected step counts)
  • make benchmark target runs locally and outputs a timing/step-count table
  • --profile mode generates .prof files for pytest integration tests using pyk.testing.Profiler
  • benchmark.yml GitHub Actions workflow uploads benchmark artifacts on every master push
  • CI posts a step-summary table comparing current vs. baseline results
  • Documentation added to docs/dev/ on how to run benchmarks and interpret results

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions