[DRAFT] FEAT: Benchmark Scenario#1662
Conversation
| adversarial_models: list[PromptChatTarget] | None = None, | ||
| ) -> tuple[type[ScenarioStrategy], dict[str, str], list[AttackTechniqueSpec]]: | ||
| """ | ||
| Build the Benchmark strategy class dynamically from SCENARIO_TECHNIQUES. |
There was a problem hiding this comment.
I think we can replace these at the factory level, and simplify things a bunch. I'm going to take a stab
There was a problem hiding this comment.
There might be ways to simplify so we don't need to overwrite _get_atomic_attacks_async either, but for now I think something like this would be good.
The fundamental architectural difference: this PR treats models as a strategy dimension (permuting them into enum members), requiring two different strategy classes and a _prepare_strategies override to reconcile them.
#1664 treats models as a runtime parameter (looping at create-time), keeping the strategy axis purely about technique selection — which is what it was designed for.
|
|
||
| if adversarial_models: | ||
| permuted_specs = [] | ||
| for model in adversarial_models: |
There was a problem hiding this comment.
are model names definitely unique? just thinking if we have 2 models w same name we have a slight issue I think currently - ie if we have 2 "gpt-4o" model names, we end up with two identical technique names that resolve, and so the 2nd model would get overwritten wo any warning/error. maybe we add a suffix to ensure unique names or we do checking for model label collisions early & raise warning early so its not silent?
There was a problem hiding this comment.
(oh rich's suggestion might remove this issue)
Description
Adds a benchmarking scenario to PyRIT to compare the performance between adversarial targets. This is currently a draft PR and there are several design conflicts to resolve before opening for review.
The largest design tension is that get_strategy_class doesn't work with the factory pattern for scenario strategy generation, because the scenario instance changes the scenario strategy for benchmarks. The working solution is to intercept the lifecycle at several points in the scenario (_build_benchmark_strategy => _prepare_strategies => _get_atomic_attacks). This works but is very brittle. Callers like registries see a "blank" version of the strategy while at runtime the strategy is populated fully with live adversarial targets.
We explicitly filter out non-adversarial attack strategies using a list of attack names in _build_benchmark_strategy, but this is also brittle. We have options for adding richer tagging. A cheap intervention could be to check if adversarial_target is an attribute of that attack type. Another could be to use TargetCapabilities and add an is_adversarial tag, which could pass through the attack to the caller in the scenario. But as-is we're just keeping a literal list of attacks we know have adversarial targets.
The original requirements asked to grab list[PromptChatTarget] in the constructor. The issue with this is that targets don't know they're adversarial, so we need to label them with a human-readable name. model_name isn't guaranteed and similar fields don't exist in the so we fall back on the identifier. Not a great design in my opinion. Inferring the model name from a private attribute is also a yellow flag. We could change the constructor to grab dict[str, PromptChatTarget] where str is a human-readable name, but that's less ergonomic.
There's explicitly no CLI support, and there can't be because of the get_strategy_class issue. This will have downstream implications for the GUI that I'd like to fix.
Scenarios are designed to be plug-and-play. Do we need a list of default adversarial targets?
_build_benchmark_strategy is a huge function and should be refactored since it returns a tuple of length 3. It does too much but I'm not sure how to refactor this while keeping it similar to rapid response.
TBD on if this should get an integration test in this PR.
Tests and Documentation
Added tests/unit/scenario/test_benchmark.py.