[WiP] CI test for automated attention benchmarking suite by amd-callumm · Pull Request #897 · ROCm/vllm

amd-callumm · 2026-04-24T20:09:53Z

Purpose

Leverage the existing pytest and (manual) attention backend benchmarking infrastructure to implement automated attention performance regression tests. Each test runs benchmark.py against a YAML config designed to imitate a model of interest (number of heads, head dimensions, etc) while defining certain batch specs (input/output tokens, batch count) and attention backends to run. The current set of tests covers the TRITON_ATTN and ROCM_AITER_UNIFIED_ATTN backends. Each model config includes a long-context prefill-only case, decode-only, and one prefill/decode combination of interest for Strix Halo. The YAML file also defines the number of warmup + benchmark iterations to run for each of these cases.

The automated tests run each model config + batch spec + backend combination, with a 10-second cooldown between each to minimize the risk of thermal GPU throttling that could lead to unstable results. Each such case's results are output to a json file under tests/kernels/attention/benchmark/output/<gfx_target>/, which is compared to a golden reference/baseline.

Currently, these tests are Strix Halo only, but the infrastructure can easily support other platforms such as Strix Point.

Test cases can be marked as "skip" to avoid running the benchmarks, or "intermittent" to mark tests as working, but with unstable performance. Intermittent cases' performance will only be compared to the golden

For now, these tests are not run in any CI job (similar to @eble-amd, I saw far slower performance running on the CI machine compared to my local one; until this is understood, the CI job will not be useful).

Test command

pytest tests/kernels/attention/benchmark/test_benchmark_attention.py::test_benchmark_regression [--attn-bench-intermittent]

Test Result

After 5 consecutive runs on my local machine, out of 30 test cases (5 model configs * 3 batch cases * 2 backends), all showed less than 10% variance in the mean time-per-iteration compared to goldens. 27 of these showed less than 1% variance in all 5 runs.

No tests currently require the skip or intermittent flags, but both of these flags have been manually validated during development.

During test runs, I monitored my Strix Halo machine's GPU temperature at 5-second intervals and found that a 10-second cooldown was sufficient to keep the edge temperature below 65°C even with repeated runs of the test suite; well below typical thermal throttling thresholds. I haven't tried to push this interval any lower.

Signed-off-by: Callum Mitchell <callumm@amd.com>

Co-authored-by: Claude Signed-off-by: Callum Mitchell <callumm@amd.com>

Signed-off-by: Callum Mitchell <callumm@amd.com>

eble-amd

This looks good mainly. Three things I recommend changing:

rename the test function
rename pct_change
clarify config validation

The rest could be ignored.

eble-amd · 2026-04-29T17:32:28Z

+def pytest_addoption(parser):
+    """Add custom command-line options for attention benchmark tests."""
+    parser.addoption(
+        "--attn-bench-intermittent",


I don't demand that you change this before merging, but I'm giving notice that if you leave it to me, I will change these tests to use --intermittent from the top-level tests/conftest.py when I rebase my #898 onto your changes.

eble-amd · 2026-04-29T17:37:52Z

+def test_benchmark_regression(
+    config_name: str,


This name is probably too general; there will be other benchmark regression tests.

eble-amd · 2026-04-29T17:41:51Z

+)
+def test_benchmark_regression(


PR #898 proposes a @pytest.mark.benchmark marker that can be used to filter tests. If you don't wish to bring it into this PR, that's no problem; I'll mark these as benchmarks when I rebase #898 onto your changes.

eble-amd · 2026-04-29T18:06:17Z

+    # Build lookup tables by config key
+    actual_by_key = {make_config_key(e): e for e in actual_results}
+    golden_by_key = {make_config_key(e): e for e in golden_results}
+    matching_keys = set(actual_by_key.keys()) & set(golden_by_key.keys())


Above, the comments say,

Validation Rules: - Entry count must match golden - All configs in golden must exist in actual (order-independent)

but it isn't obvious where this validation occurs. It looks like cases that are unique to actual or golden are just ignored.

eble-amd · 2026-04-29T18:20:46Z

+        if is_intermittent and not bench_intermittent:
+            num_skipped_intermittent += 1
+            continue


Skip earlier, maybe? If the performance won't be validated, why spend time running the test? If there's a good reason to do it this way, it's worth a comment, otherwise someone might waste time trying to change it before discovering why it is the way it is.

eble-amd · 2026-04-29T18:30:10Z

+            pct_change = (actual_mean - golden_mean) / golden_mean
+


not really a percentage

eble-amd · 2026-04-29T18:49:41Z

+# Subprocess timeout (seconds)
+BENCHMARK_TIMEOUT = 900


This is not a blocking issue (because you're not running these tests in CI), but this is longer than the --timeout 300 in build-rocm-wheels.yml.

eble-amd · 2026-04-29T18:55:37Z

+    """
+    Get absolute path to attention benchmark directory


According to Clod, "In practice, pytest on Python 3.12 sets file to an absolute path, so it works — but the comments overclaim what the code ensures." Calling resolve() would make the code consistent with the comment, but then it might create problems by also resolving symlinks. Is it essential to use absolute paths here? Maybe just strike that word from the comments.

amd-callumm force-pushed the callumm.attn_bench_test_ci branch 9 times, most recently from 227fd1f to 26cdf3c Compare April 28, 2026 20:58

[kernels] attn bench YAML configs for pytest

40860bf

Signed-off-by: Callum Mitchell <callumm@amd.com>

amd-callumm force-pushed the callumm.attn_bench_test_ci branch from 26cdf3c to c2ed5d9 Compare April 28, 2026 23:14

[kernels] Add skip, intermittent, cooldown args to attn benchmark script

ca260bc

Co-authored-by: Claude Signed-off-by: Callum Mitchell <callumm@amd.com>

amd-callumm force-pushed the callumm.attn_bench_test_ci branch from c2ed5d9 to ab95a95 Compare April 28, 2026 23:37

amd-callumm added 2 commits April 28, 2026 17:52

[kernels] implement automated attention benchmark pytests

8fe7f0d

Co-authored-by: Claude Signed-off-by: Callum Mitchell <callumm@amd.com>

[kernels] golden refs for attention benchmark pytests

6bc7e34

Signed-off-by: Callum Mitchell <callumm@amd.com>

amd-callumm force-pushed the callumm.attn_bench_test_ci branch from ab95a95 to 6bc7e34 Compare April 28, 2026 23:54

amd-callumm marked this pull request as ready for review April 28, 2026 23:58

amd-callumm requested review from eble-amd and mgehre-amd April 28, 2026 23:59

eble-amd reviewed Apr 29, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WiP] CI test for automated attention benchmarking suite#897

[WiP] CI test for automated attention benchmarking suite#897
amd-callumm wants to merge 4 commits intogfx11from
callumm.attn_bench_test_ci

amd-callumm commented Apr 24, 2026 •

edited by github-actions Bot

Loading

Uh oh!

eble-amd left a comment

Uh oh!

eble-amd Apr 29, 2026

Uh oh!

eble-amd Apr 29, 2026

Uh oh!

eble-amd Apr 29, 2026

Uh oh!

eble-amd Apr 29, 2026

Uh oh!

eble-amd Apr 29, 2026

Uh oh!

eble-amd Apr 29, 2026

Uh oh!

eble-amd Apr 29, 2026

Uh oh!

eble-amd Apr 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		)
		def test_benchmark_regression(

Conversation

amd-callumm commented Apr 24, 2026 • edited by github-actions Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test command

Test Result

Uh oh!

eble-amd left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

amd-callumm commented Apr 24, 2026 •

edited by github-actions Bot

Loading