Skip to content

fix(grader): K-pair replay-window detection + multi-instance diagnostic for neg/016#1035

Merged
bokelley merged 3 commits intomainfrom
claude/issue-1032-neg016-multi-instance-diagnostic
Apr 28, 2026
Merged

fix(grader): K-pair replay-window detection + multi-instance diagnostic for neg/016#1035
bokelley merged 3 commits intomainfrom
claude/issue-1032-neg016-multi-instance-diagnostic

Conversation

@bokelley
Copy link
Copy Markdown
Contributor

Closes #1032

Vector neg/016-replayed-nonce previously sent exactly one (probe1, probe2) pair. Against multi-instance deployments (Fly anycast, AWS ALB, k8s replicas > 1) with per-process InMemoryReplayStore, the two probes could land on different instances, each with its own replay state, causing the vector to fail non-deterministically and emit a "got 200, expected 401" diagnostic that pointed at the verifier code rather than the deployment topology. This PR replaces the single pair with K configurable pairs (default 10), adds a --replay-probe-pairs CLI flag, and emits a self-routing diagnostic that names the cross-instance replay-store pattern when some pairs accept a replayed nonce.

What changed

  • gradeReplayWindow rewritten with a K-pair loop. Each pair generates a fresh nonce (scoped to that pair only) and uses a new TCP connection (probeSignedRequest already closes its undici Agent on completion). K/K rejected → PASS. 0/K rejected → FAIL with "no replay protection" diagnostic that includes the multi-instance hint. 1/(K-1)/K rejected → FAIL with the partial-rejection count and a pointer to PostgresReplayStore / a Redis-backed ReplayStore implementation.
  • GradeOptions.replayProbePairs?: number — default 10, min 2; exposed as --replay-probe-pairs <N> on the CLI.
  • VectorGradeResult.replay_pairs_tried?: number and replay_pairs_rejected?: number — new optional fields emitted for neg/016 results.
  • Changeset: minor bump (new public API surface + behavioral default change).

What was tested

  • npx tsc --project tsconfig.lib.json --noEmitOnError false — zero new errors (2 pre-existing config warnings unchanged)
  • New test file test/request-signing-grader-replay-window.test.js — 5 tests:
    • Single-instance verifier (shared replay store): K/K pass with replayProbePairs=4 and default 10
    • Multi-instance verifier (two alternating InMemoryReplayStores): partial-rejection FAIL with multi-instance diagnostic
    • No-op replay store: 0/K FAIL with "no replay protection" diagnostic naming pair count
    • Skipped vector: replay_pairs_tried/replay_pairs_rejected absent
  • Full runtime test suite requires node_modules (not installed in this environment); CI will run the complete suite

Nits surfaced from pre-PR review (not fixed — low priority):

  • actual_error_code on the partial-failure path reflects only the final pair (may be undefined if the last pair accepted). The diagnostic text carries the real signal so this is informational noise at worst.
  • The multi-instance test server comment in the new test file could be clearer about the per-request alternation logic.

Pre-PR review

  • code-reviewer: approved after one blocker fix (hardcoded http_status: 200 on partial-failure path replaced with observed lastSecondStatus) — nits noted above
  • ad-tech-protocol-expert: approved after one blocker fix (changeset bump patchminor — new exported fields + behavioral default change) — K/K pass threshold is correct per RFC 9421 §11.1 unconditional MUST; PostgresReplayStore reference in diagnostic is appropriate

Triage-managed PR. This bot does not currently iterate on
review comments or PR conversation threads (only on the source
issue). To unblock:

  • Push fixup commits directly: gh pr checkout <num>
    fix → push.
  • Or re-trigger: comment /triage execute on the source
    issue.

See adcp#3121
for context.

Session: https://claude.ai/code/session_01LHTJkAnfwboYmLtJywQDpe


Generated by Claude Code

…ers (#1032)

Replace the single (probe1, probe2) pair in gradeReplayWindow with K
configurable pairs (default 10). Each pair generates a fresh nonce and
uses a new TCP connection (probeSignedRequest already closes its undici
Agent), so probes may land on different load-balanced instances.

When K/K pairs reject, the vector passes. When 0/K reject, the FAIL
diagnostic names per-process InMemoryReplayStore as the likely cause.
When 1-(K-1)/K reject, the diagnostic surfaces the partial-rejection
count and the cross-instance topology hypothesis, with a pointer to
PostgresReplayStore / a Redis-backed ReplayStore implementation.

Adds VectorGradeResult.replay_pairs_tried / replay_pairs_rejected and
GradeOptions.replayProbePairs (exposed as --replay-probe-pairs on the
CLI, min 2, default 10).

https://claude.ai/code/session_01LHTJkAnfwboYmLtJywQDpe
bokelley and others added 2 commits April 28, 2026 06:42
The 'broken verifier (no replay protection)' test built a custom store
with `check`/`record` methods, but the real `ReplayStore` interface is
`has`/`isCapHit`/`insert`. Calling the wrong methods made the verifier
throw on first request, causing the K-pair grader to early-exit on
the first iteration with replay_pairs_tried=1 instead of looping K=3
times. The test asserted ===3 and failed.

Rewrites noopStore to implement has/isCapHit/insert correctly, all
returning the no-op result that simulates a verifier with no replay
protection (always 'not seen', always 'ok' on insert).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@bokelley bokelley marked this pull request as ready for review April 28, 2026 10:57
@bokelley bokelley merged commit 36d3c81 into main Apr 28, 2026
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

2 participants