Skip to content

Reference Only: Benchmark redesign (1662 Feedback)#1664

Draft
rlundeen2 wants to merge 7 commits intomicrosoft:mainfrom
rlundeen2:benchmark-redesign
Draft

Reference Only: Benchmark redesign (1662 Feedback)#1664
rlundeen2 wants to merge 7 commits intomicrosoft:mainfrom
rlundeen2:benchmark-redesign

Conversation

@rlundeen2
Copy link
Copy Markdown
Contributor

The fundamental architectural difference: 1662 treats models as a strategy dimension (permuting them into enum
members), requiring two different strategy classes and a _prepare_strategies override to reconcile them.

This PR treats models as a runtime parameter (looping at create-time), keeping the strategy axis purely about technique selection — which is what it was designed for.

Comment thread pyrit/scenario/scenarios/benchmark/benchmark.py Outdated
rlundeen2 and others added 2 commits April 28, 2026 10:38
Replace static BENCHMARK_TECHNIQUES list with _get_benchmarkable_specs()
that filters SCENARIO_TECHNIQUES using two criteria:
- _accepts_adversarial(attack_class): technique CAN use adversarial model
- adversarial_chat is None: technique does NOT have one baked in

New adversarial techniques added to SCENARIO_TECHNIQUES are auto-discovered.
Fix test to use _adversarial_chat private attr on AtomicAttack.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
]


def _build_benchmark_strategy() -> type[ScenarioStrategy]:
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So much of the strategy is shared with rapid response, these two functions could likely use a helper

build_strategy_from_techniques

@rlundeen2 rlundeen2 changed the title Benchmark redesign (1662 Feedback) Reference Only: Benchmark redesign (1662 Feedback) Apr 28, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants