Skip to content

FEAT text adaptive scenario#1760

Open
hannahwestra25 wants to merge 11 commits into
microsoft:mainfrom
hannahwestra25:hawestra/text_adaptive_scenario
Open

FEAT text adaptive scenario#1760
hannahwestra25 wants to merge 11 commits into
microsoft:mainfrom
hannahwestra25:hawestra/text_adaptive_scenario

Conversation

@hannahwestra25
Copy link
Copy Markdown
Contributor

Add Adaptive Scenario Framework with TextAdaptive

Summary

Introduces an adaptive scenario framework that picks attack techniques per-objective using an epsilon-greedy bandit informed by observed success rates, rather than running every selected technique against every objective. Concentrates spend on techniques that actually work against the target, and stops early on first success.

Adds:

  • AdaptiveScenario — modality-agnostic base class
  • TextAdaptive — concrete text-attack subclass
  • AdaptiveTechniqueSelector — epsilon-greedy selector with Laplace-smoothed estimates and pooled cross-context backoff
  • AdaptiveDispatchAttack — per-objective dispatch strategy
  • Walkthrough notebook + .py doc
  • Unit tests (63 tests across selector, dispatcher, and scenario)

Motivation

Static scenarios are O(techniques × objectives): every technique runs against every objective regardless of whether earlier attempts already succeeded or whether the technique is known to be ineffective against the target. For evaluation runs with many techniques and many objectives, this wastes spend on combinations that aren't informative.

Adaptive scenarios reduce this to O(max_attempts × objectives) by:

  • learning from observed outcomes,
  • exploiting techniques that work on the target,
  • still exploring (with probability epsilon) so the table doesn't collapse onto a single technique prematurely,
  • stopping per-objective on first success.

How it works

For each objective the dispatcher loops up to max_attempts_per_objective times:

  1. Select — with probability epsilon pick a random technique, otherwise pick the one with the highest Laplace-smoothed success estimate (s + 1) / (n + 1). Cells with fewer than pool_threshold local observations fall back to the technique's pooled rate across all contexts (cold-start handling).
  2. Execute — run the chosen technique against the bound SeedAttackGroup, merging the technique's seed_technique if it declares one.
  3. Record — update the selector's (context, technique) → (successes, attempts) table and stop early on success.

The selector is shared by reference across all per-objective dispatchers in a scenario run, so learning accumulates globally. The per-objective context key is derived by a ContextExtractor; global_context (default) shares one table across all objectives, harm_category_context partitions by harm category.

Public API

from pyrit.scenario.scenarios.adaptive import TextAdaptive, harm_category_context

scenario = TextAdaptive(
    epsilon=0.2,
    pool_threshold=3,
    max_attempts_per_objective=3,
    seed=42,                                  # reproducible selection
    context_extractor=harm_category_context,  # optional per-category learning
)
await scenario.initialize_async(objective_target=target)
result = await scenario.run_async()

Adaptive scenarios are also resumable — pass scenario_result_id="..." to the constructor and prior dispatch trails are replayed into the selector before the remaining objectives run.

notes

  • BASELINE_POLICY = Forbiddenprompt_sending participates as one of the selector's techniques rather than being prepended as a guaranteed baseline. Mixing a forced baseline with adaptive selection would bias the table.
  • Per-objective compatibility filtering — techniques whose seed_technique is incompatible with a given seed group are dropped per-objective rather than globally, so a technique that's incompatible with one objective still participates for others. Objectives with no compatible techniques are skipped with a warning.
  • Resume rehydration — each attempt is persisted as a step in metadata["adaptive_attempts"] on the outer AttackResult; on resume these trails are replayed via record_outcome so the new selector starts with the prior run's learned state. Already-completed atomics are skipped by the base Scenario resume path (by atomic_attack_name), so there is no double-counting.
  • Two-row persistence per success — the inner technique persists its raw AttackResult via its own post-execute hook; the dispatcher returns a replace-based copy with a fresh attack_result_id/timestamp and the adaptive trail stamped onto metadata. Both rows share conversation_id. Documented on the dispatcher's class docstring.
  • Thread safetyAdaptiveTechniqueSelector guards its counts table with a threading.Lock so individual select / record_outcome operations are atomic. The overall select → execute → record sequence is not serialized.

Files

Area File
Base scenario pyrit/scenario/scenarios/adaptive/adaptive_scenario.py
Text subclass pyrit/scenario/scenarios/adaptive/text_adaptive.py
Selector + context extractors pyrit/scenario/scenarios/adaptive/selector.py
Per-objective dispatcher pyrit/scenario/scenarios/adaptive/dispatcher.py
Package wiring pyrit/scenario/scenarios/adaptive/__init__.py, pyrit/scenario/__init__.py
Walkthrough doc/code/scenarios/3_adaptive_scenarios.py, doc/code/scenarios/3_adaptive_scenarios.ipynb
Tests tests/unit/scenario/scenarios/adaptive/test_selector.py, tests/unit/scenario/scenarios/adaptive/test_dispatcher.py, tests/unit/scenario/scenarios/adaptive/test_text_adaptive.py

Testing

pytest tests/unit/scenario/scenarios/adaptive/ — 63 tests pass. Coverage includes:

  • selector exploration / exploitation / cold-start / pooled backoff / concurrent record_outcome
  • dispatcher early-stop, max-attempts retry, label propagation, context-label routing, fresh-result invariant
  • scenario atomic-attack emission, harm-category partitioning, per-objective seed-technique filtering, resume rehydration, baseline-policy enforcement

Comment on lines +80 to +87
return [
"airt_hate",
"airt_fairness",
"airt_violence",
"airt_sexual",
"airt_harassment",
"airt_misinformation",
"airt_leakage",
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we may want a separate set of datasets


Owns selector wiring, dispatcher construction, per-objective atomic-attack
emission, and resume rehydration. Concrete subclasses (``TextAdaptive``,
future ``ImageAdaptive`` / ``AudioAdaptive``) only declare strategy class,
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note: this base class really only makes sense because I was thinking there would be other modality adaptive scenarios

Copy link
Copy Markdown
Contributor

@rlundeen2 rlundeen2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking great!

rehydration are handled here.
"""

BASELINE_POLICY: ClassVar[BaselinePolicy] = BaselinePolicy.Forbidden
Copy link
Copy Markdown
Contributor

@rlundeen2 rlundeen2 May 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(reposting inline review threads — original comments below were drafted locally and did not survive the review submission; reposting for visibility. Sorry it appears as if I'm having an AI convo with myself, which is not too dissimilar to normal life)

rehydration are handled here.
"""

BASELINE_POLICY: ClassVar[BaselinePolicy] = BaselinePolicy.Forbidden
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is new this release. I know not related to this PR, but can we think of a more descriptive name? I get that it's for the baseline, but because of the multiple uses of the word it is confusing to me. Is there something more descriptive we can think of?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, the name is confusing — and I think part of the reason is that the enum is actually conflating two orthogonal axes:

  1. Is the baseline supported at all? (Forbidden vs the other two)
  2. What's the default inclusion behavior when it is supported? (Enabled = on by default, Disabled = off by default)

Calling that single thing BASELINE_POLICY with values Enabled/Disabled/Forbidden reads like a tri-state switch on the baseline itself, which collides with include_baseline=..., _include_baseline, _build_baseline_atomic_attack, and the literal "baseline" atomic name.

A few rename options, roughly in my order of preference:

  • BASELINE_SUPPORT: BaselineSupport with values AutoInclude / OptIn / Unsupported. Reads naturally as "what kind of support this scenario has for a baseline" and the value names directly answer "what does the caller have to do to get one?".
  • BASELINE_DEFAULT with values Included / Excluded / Unsupported. Similar idea, frames it as "the default behavior".
  • DEFAULT_BASELINE_BEHAVIOR — verbose but unambiguous; keeps the existing enum members.

If we wanted to be more aggressive and actually split the two axes, it'd be something like BASELINE_SUPPORTED: bool + BASELINE_DEFAULT_INCLUDED: bool — cleaner semantically but two attributes per scenario, and you lose the single "Forbidden" sentinel that's nice for the runtime override check.

My vote: BASELINE_SUPPORT / BaselineSupport.{AutoInclude, OptIn, Unsupported}. It reads correctly at the use site (BASELINE_SUPPORT = BaselineSupport.Unsupported for adaptive) and the value names tell you the actual user-visible contract without overlapping with include_baseline. Since this is class-level + an enum + one runtime kwarg check, the rename should be mechanical.

You're right it's pre-existing — worth filing as a separate cleanup PR rather than expanding this one's scope.

rehydration are handled here.
"""

BASELINE_POLICY: ClassVar[BaselinePolicy] = BaselinePolicy.Forbidden
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I think we should include the baseline here by default

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a real tension here worth flagging before flipping it.

Argument for Enabled (your suggestion):

  • Consistency with every other scenario — users will expect a baseline column.
  • The "did adaptive actually help?" story only works with a comparison point: baseline X% solved vs. adaptive Y% solved is the headline metric users will want.
  • Without a baseline the only reference is the selector's internal prompt_sending success rate, which (a) isn't surfaced in the printer and (b) reflects whatever subset of objectives prompt_sending happened to get picked for, not all of them.

Argument for Forbidden (current):

  • prompt_sending is already one of the techniques the selector chooses from. Enabling baseline means every objective runs prompt_sending once unconditionally as baseline, plus possibly again as the selector's pick (typically the very first attempt, since unseen techniques tie at 1.0). That's redundant API spend, and on a successful baseline objective the adaptive loop still runs max_attempts_per_objective times — so cost goes from O(max_attempts × N) to O((max_attempts + 1) × N) with the extra call being a literal duplicate of an attempt that often happens anyway.
  • More subtly: the baseline outcomes aren't fed into the selector. So we pay for N extra prompt_sending calls and throw away the signal — which is the worst of both worlds for an "adaptive" scenario.

My recommendation: BaselinePolicy.Disabled, not Enabled.

  • Disabled keeps the baseline available (caller passes include_baseline=True to initialize_async when they want the comparison), but doesn't pay the cost by default. This matches Jailbreak's posture, where the baseline is meaningful but not always wanted.
  • If you do want Enabled as the default, the dispatcher really should seed the selector from the baseline results before the adaptive loop runs — otherwise you're burning calls that don't inform any decision. That's a non-trivial change to the dispatcher contract.

Either change is small (one constant), but Enabled-without-seeding is the option I'd actively push back on. Curious if there's a third path I'm missing — e.g. exposing the per-technique success table in the printer so users can see prompt_sending's rate without needing a separate baseline row.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe let's remove prompt_sending as one of the techniques

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this direction — it makes the model conceptually clean: baseline = "what does the target do unprovoked", techniques = "what actively-adversarial moves help". And it gives you the headline comparison metric users want without paying for duplicate calls. Worth doing.

A few things to work through if you go this way:

1. Strategy enum filtering. TextAdaptive.get_strategy_class() builds from SCENARIO_TECHNIQUES, which includes prompt_sending. You'll need to filter it out at build time — either:

  • Pass a TagQuery that excludes prompt_sending in build_strategy_class_from_specs, or
  • Filter the spec list before passing it in.

The cleaner path is probably to add an exclusion list to _build_text_adaptive_strategy so future "trivial" techniques can be excluded the same way.

2. The default and single_turn aggregates. If prompt_sending is tagged single_turn and default in the central catalog, --strategies single_turn on TextAdaptive will silently mean something different than the same flag on Jailbreak/AIRT. Worth either documenting that explicitly or renaming the aggregate (e.g. single_turn_adaptive).

3. User explicitly requests prompt_sending. With it removed from the enum, --strategies prompt_sending would fail at strategy resolution — that's actually the right behavior (fail fast with a clear "prompt_sending runs as the baseline; use --include-baseline" message), but the error needs to be friendly.

4. Empty technique pool edge case. AdaptiveDispatchAttack.__init__ already raises on empty techniques, so a misconfigured run errors loudly. Good.

5. Bonus optimization the new design unlocks. With baseline-enabled, the dispatcher could short-circuit objectives where baseline already succeeded — no point running an adaptive loop on an objective the target already complied with unprompted. That'd be:

  • A meaningful cost win on weak targets,
  • A cleaner narrative ("adaptive only fires on objectives that resisted baseline"),
  • And it'd give a clean "uplift" metric: (adaptive_successes - baseline_successes) / baseline_failures.

This is optional, but it's the kind of thing that makes the scenario distinctively "adaptive" rather than "tries multiple techniques".

6. Selector seeding from baseline. Less compelling once prompt_sending is out of the pool — the baseline result doesn't directly inform any technique's success rate. So you can skip the seeding work I mentioned in my previous comment.

Net: I think prompt_sending out + BaselinePolicy.Enabled + (eventually) baseline-success short-circuit is the right end state. The first two are small changes; the short-circuit is a follow-up.

objective_scorer: TrueFalseScorer | None = None,
epsilon: float = 0.2,
pool_threshold: int = 3,
max_attempts_per_objective: int = 3,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think max_attempts_per_objective and probably epsilon should be exposed via parameters

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strongly agree, and I'd extend it: all of epsilon, pool_threshold, max_attempts_per_objective, and seed should go through supported_parameters().

Two reasons it's more than a CLI nicety:

1. CLI/config reachability. Right now there's no way to tune any of these from pyrit_scan or a YAML config — you'd have to write Python to instantiate TextAdaptive(epsilon=0.5). That defeats most of the point of a configurable scenario.

2. Resume safety (this is the bigger one). Scenario._validate_stored_scenario only compares self.params. Since none of the adaptive knobs land in self.params today, a user can pass a scenario_result_id with a completely different epsilon / max_attempts_per_objective / seed and resume succeeds silently — but the new run's selection behavior won't match what produced the stored trail. Routing them through supported_parameters() makes the validator reject divergence automatically.

Concrete shape (matching the existing ScamScenario pattern):

@classmethod
def supported_parameters(cls) -> list[Parameter]:
    return [
        Parameter(name="epsilon", description="Exploration probability [0.0, 1.0].",
                  param_type=float, default=0.2),
        Parameter(name="max_attempts_per_objective", description="Max techniques tried per objective.",
                  param_type=int, default=3),
        Parameter(name="pool_threshold", description="Min per-(context, technique) attempts before "
                  "the local estimate overrides the pooled rate.",
                  param_type=int, default=3),
        Parameter(name="seed", description="RNG seed for deterministic selection.",
                  param_type=int, default=None),
    ]

Then in __init__, prefer self.params["epsilon"] over the kwarg when both are present (or just drop the kwargs in favor of self.params lookups — apply_defaults will fill them in from declared defaults). The validation/coercion in set_params_from_args will reject bad values (e.g. epsilon=2.0) before AdaptiveTechniqueSelector.__init__ ever raises, which gives users a friendlier error path.

One thing to skip: context_extractor shouldn't be a CLI parameter — it's a callable. If you want it CLI-reachable, expose a string enum ("global" | "harm_category") and resolve internally. Keep the callable for programmatic use.


techniques = self._build_techniques_dict(objective_target=self._objective_target)

selector = AdaptiveTechniqueSelector(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may want to name this more specifically, because I could envision different types of selectors also. But maybe this is a future problem. Nit only

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, and I'd actually push slightly harder than "nit" — this is cheap to fix now and expensive to fix later (the class name is part of the public __init__.py surface).

Concrete suggestion:

  1. Rename the concrete class to EpsilonGreedyTechniqueSelector (or EpsilonGreedySelector). The current name describes what role it plays in the scenario, not what algorithm it implements — and the algorithm is what callers will care about when picking between selectors.

  2. Extract a TechniqueSelector Protocol (or ABC) capturing the surface the dispatcher actually depends on — just select(*, context, techniques) -> str and record_outcome(*, context, technique, success). The dispatcher and AdaptiveScenario type-hint against the Protocol, the concrete class is one implementation.

  3. Plumb selector choice as a constructor arg. AdaptiveScenario.__init__(..., selector: TechniqueSelector | None = None) defaulting to EpsilonGreedyTechniqueSelector(epsilon=..., pool_threshold=..., rng=...). That immediately unlocks future selectors (UCB1, Thompson sampling, contextual bandits, even a plug-in for a tuned policy) without subclassing AdaptiveScenario.

  4. The rehydration hook (_rehydrate_selector_from_memory) needs to become selector-aware, since UCB-style selectors care about timestamps, Thompson sampling needs Beta posterior parameters, etc. For v1 you can document that only epsilon-greedy is rehydratable and others start fresh — but it's worth a TODO so it doesn't surprise the next contributor.

Steps 1–3 are mechanical and worth doing in this PR (keeps the public API stable for v1). Step 4 is genuinely future work.

The epsilon/pool_threshold constructor args on AdaptiveScenario then become awkward — they're epsilon-greedy specific. Either:

  • Drop them from the scenario constructor entirely and require callers wanting non-defaults to pass a constructed selector=..., or
  • Keep them as ergonomic shortcuts that only apply to the default selector (raise if selector= is also passed with these).

I'd go with the first — simpler contract, and once you have selector= as the extension point, the kwargs are redundant sugar.

)
return None

adaptive_context = self._context_extractor(seed_group)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a different solution for this problem here: #1758; you may want to merge in so we don't have two ways of doing this

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch — #1758 solves the same problem (stable resume keys despite duplicate / re-fetched seed groups) at the framework layer, via the new scenario_data.objective_index column on AttackResultEntry. That's strictly better than what this PR does locally with objective_id-in-the-atomic-name.

Worth thinking about beyond just "merge in":

Big architectural implication if #1758 lands first. The "one AtomicAttack per objective" pattern in this PR (atomic_attack_name = f"{prefix}_{dataset_name}_{objective_id}") is itself a workaround for the resume-key problem that #1758 fixes. Once resume works at the per-objective-index level, this scenario can collapse back to one AtomicAttack per (dataset, technique-pool) with N seed groups, and let the executor's per-objective tracking handle resumption.

That collapse would:

  • Eliminate the objective_id-in-name code path entirely (deleting the dead uuid.uuid4() fallback I flagged in my main review).
  • Restore --max-concurrency parallelism across objectives — currently broken because each objective is its own atomic that runs sequentially under the scenario loop (this is issue Add poetry attributes #5 in my main review).
  • Make the dispatcher's persistence story simpler: one inner attack row per attempt, one outer dispatcher row per objective, all under the same atomic name with proper objective_index linkage.

Merge order matters. If #1758 merges first, this PR should be rewritten on top of it rather than just rebased. If this PR merges first, #1758 needs to handle the per-objective atomic naming convention gracefully (probably fine, but worth a heads-up to the #1758 author).

For now, I'd:

  1. Drop the uuid.uuid4() fallback regardless (dead in practice, fails resume if ever hit).
  2. Coordinate with FEAT: Better Scenario Tracking #1758 on landing order.
  3. After FEAT: Better Scenario Tracking #1758 lands, file a follow-up to collapse adaptive back to one-atomic-per-(dataset, pool) and pick up the parallelism win for free.

self._pool_threshold = pool_threshold
self._rng = rng if rng is not None else random.Random()
self._counts: dict[tuple[str, str], tuple[int, int]] = {}
# Per-technique pooled counts, kept in sync with ``_counts`` so the
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we persist rng in a way that makes it resumable? It'd be good to have these reproducible. Because it's re-random each time it makes things differ.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes — this is the right place to fix it. The current setup is reproducible within a from-scratch run but breaks down on resume, because AdaptiveScenario._get_atomic_attacks_async builds a fresh random.Random(self._seed) on every initialize_async call. After rehydration the counts table is correct but the RNG state is back at the start, so decisions past the resume point diverge from what an uninterrupted run would have produced.

Three options, increasing effort / increasing correctness:

1. Document the limitation only. Cheapest. Add to the seed docstring: "Determinism is per-run; resumed runs start with fresh RNG state and will diverge in selection (but not in learned counts) from an uninterrupted run." This is fine if "reproducibility" means "rerun from scratch with seed=X always gives the same result" and you don't promise resume continuity.

2. Derive RNG state per-decision (my preference). Move RNG out of selector state entirely; instead, in select, do something like:

def select(self, *, context: str, techniques: Sequence[str]) -> str:
    ...
    decision_rng = random.Random((self._seed, context, len(self._counts)))
    if decision_rng.random() < self._epsilon:
        return decision_rng.choice(technique_list)
    ...

…or even better, let the caller (AdaptiveDispatchAttack._perform_async) pass decision_key=(objective_id, attempt_idx) into select, and use that to seed a per-decision Random. Properties:

  • Resume continuity: the same (seed, objective_id, attempt_idx) always yields the same decision, regardless of how many other objectives ran before.
  • No state to persist — derived deterministically from inputs that are already on the AttackResult.
  • Loses the global-shuffle property (decisions for different objectives are no longer cross-correlated), which is usually a feature, not a bug, for reproducibility.

3. Persist rng.getstate() after each record_outcome and restore on rehydration. Most faithful — resume produces bit-identical continuation of the original run. Cost: serializing a Random state into the adaptive_attempts metadata trail (it's a tuple of ints, JSON-friendly), and a small amount of plumbing in _rehydrate_selector_from_memory. Worth it only if you genuinely care about "resumed = uninterrupted" equivalence.

My recommendation: Option 2. It's a ~10-line change, makes the reproducibility story honest in both directions, and avoids the persistence/format-versioning headache of Option 3.

If you go with Option 1 instead, please at minimum update the "Reproducible runs" cell in 3_adaptive_scenarios.py — right now it implies resume preserves determinism, which isn't quite true.


Raises:
ValueError: If ``self._objective_target`` is not set, or if
``_build_techniques_dict`` finds no usable techniques.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way we can do this but have increased concurrency? We may need to postpone. But right now concurrency is all within a single atomic attack, and with only one objective we lose that

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you've identified the real architectural cost of "one atomic per objective". Concrete paths, roughly in increasing effort:

1. Collapse to one atomic per (dataset, technique-pool) with N seed_groups. This is the right end state, but requires two changes:

  • Detach the dispatcher from a fixed seed_group. Today AdaptiveDispatchAttack.__init__ takes seed_group= and stashes it on self._seed_group. Move it onto the per-call context instead — either via a custom AttackParameters subclass that captures the original SeedAttackGroup, or by reading it from context.memory_labels / a context field populated by the executor.
  • Stable resume keys without objective-level atomic names. This is the piece that's blocked on FEAT: Better Scenario Tracking #1758. Once scenario_data.objective_index lands, per-objective resume works without the atomic-per-objective hack, and one atomic with N seed_groups gets you parallelism for free (executor's existing max_concurrency semaphore handles it).

2. Inside the atomic, parallelize per-attempt across objectives. A halfway option if #1758 is far off: keep the dispatcher's per-objective loop but launch K dispatcher instances concurrently via asyncio.gather with a semaphore. Implementation is moderate; resume keying still needs work.

3. Postpone. Defensible for v1 — ship the sequential version, file a follow-up, and pick it up after #1758 lands. The notebook should explicitly warn about it ("adaptive scenarios currently serialize objectives; --max-concurrency is a no-op").

Two algorithmic caveats that apply to any concurrent option — worth thinking about before you implement, even though they don't block:

  • Cold-start cohort effect. With N dispatchers selecting concurrently against a near-empty counts table, every dispatcher sees the same 1.0 ties and picks the same techniques (typically the first-tied or the seeded random tiebreak). The first cohort of K decisions happen before any record_outcome lands, so learning per cohort is reduced. Mitigations: (a) sequentially process a small "warmup" cohort to seed the table, then go concurrent; (b) switch the selector to Thompson sampling (posterior sampling naturally handles parallel decisions); (c) accept it as the cost of parallelism (probably fine in practice — exploration epsilon and unseen-technique optimism mostly cover it).
  • The threading.Lock in the selector is overkill for the asyncio path. asyncio is single-threaded, so select and record_outcome (both pure sync, no await) already complete atomically. The lock guards against true threading (e.g. someone wrapping the selector in asyncio.to_thread), which is fine to keep, but the real race is the select→execute→record window across coroutines, which no lock at the selector level can fix. That's the cohort effect above.

My recommendation: postpone, file a follow-up issue blocked on #1758, and add the no-op warning to the notebook now so users don't reach for --max-concurrency expecting it to work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants