spec(tmp): IdentityMatch & frequency capping architecture by bokelley · Pull Request #3359 · adcontextprotocol/adcp

bokelley · 2026-04-27T11:53:18Z

Note

Post-pivot status (current). Earlier drafts of this PR introduced a static/proto/ tree, a merge_rule policy field, and per-(fcap_key, identity) counter records. None of that is in the PR now. The reference impl in adcp-go/targeting/ was already log-based with impression_id dedup; this PR aligns the protocol-surface docs with what's actually shipping. Review history is preserved in Pivot history below.

What this PR ships

Wire-spec change (additive):

identity-match-response.json: new serve_window_sec field (1–300, default 60). Per-package single-shot fcap window — after one impression on each eligible package, the publisher MUST re-query before serving from those packages again.
identity-match-response.json: ttl_sec deprecated. Originally documented as a router cache TTL but operationally functioned as a per-package serve throttle. 6-week notice in CHANGELOG; earliest removal 2026-06-07.

Authoritative protocol docs at docs/trusted-match/:

specification.mdx — adds serve_window_sec field, marks ttl_sec deprecated, adds normative Conformance invariants for IdentityMatch eligibility (audience intersection, fcap evaluation across identities, active state, audience freshness; storage-agnostic).
identity-match-implementation.mdx (new, ~400 lines) — implementation guide: fcap_keys label model with tenant prefix and charset, log-based reference data model matching adcp-go/targeting/, identity handling and cross-identity dedup via impression_id, SDK primitives (decodeTmpx + writeExposure), pluggable store interfaces, production topology pattern (pixel → tracking endpoint → pub/sub → frequency_writer → valkey), real perf numbers from targeting/scale_test.go, conformance scenarios with concrete walkthroughs.
buyer-guide.mdx — refreshed for serve_window_sec semantics.
migration-from-axe.mdx — adds OpenRTB 2.6 User.eids[] cross-walk for buyers bridging from OpenRTB-shaped pipelines.

Architecture-history doc at specs/identitymatch-fcap-architecture.md — design rationale, deferred security/privacy follow-ups, rollout plan, consolidated thread history.

Why this matters

The reference impl in adcp-go/targeting/ had already chosen the log-based approach with impression_id dedup; the spec was diverging from it. Implementer teams (frequency_writer, SDKs) couldn't move because the spec disagreed with the code. This PR aligns docs with reality, plus ships the wire fix.

Cross-references

Optimization PR upstream: adcp-go#103 — heuristic-gated preaggregation, 11–38× measured speedup at production load.
fcap_keys generalization upstream: adcp-go#104 — generalize scalar package_id+campaign_id to arbitrary fcap_keys[] per the label model in this spec. Tracked separately, not blocking this PR.
Spec source location: specs/identitymatch-fcap-architecture.md keeps the design-decision rationale + deferred follow-ups; docs/trusted-match/ is the authoritative implementation guide.

Architectural decisions (settled)

Three-layer model: wire spec (normative), conformance invariants (normative, storage-agnostic), reference data model (non-normative, valkey-backed). Storage backend is implementer choice.
fcap_keys label model with tenant:dimension:value format and required tenant prefix. Buyers choose dimensions; protocol does not enumerate.
Cross-identity dedup via globally-unique impression_id, not merge rules. Generated at TMPX decode by the buyer's impression handler. Same impression_id written to ALL the user's resolved identity logs; read-time union recovers the count exactly. Works for graphless and graph-canonicalizing operators alike.
serve_window_sec replaces ttl_sec semantically — per-package single-shot throttle, not a router cache TTL.
Two composable SDK primitives for impression handling: decodeTmpx (pure crypto+parse) + writeExposure (pure store interaction). Production topology is pixel → tracking endpoint → pub/sub → frequency_writer → valkey; bundling decode+write would force synchronous topology.
TMP IdentityMatch service is a downstream read replica; SDK is the production management plane. No new wire endpoints for fcap policies, package CRUD, or impressions.
sync_audiences is the audience on-ramp — existing wire task with add[]/remove[] deltas matches what's needed.

Deferred (not blocking this PR; documented in spec)

TMPX harvest → competitor-suppression attack
Eligibility-as-audience-membership oracle (honeypot package_ids)
Consent revocation between IdentityMatch and impression
Side-channel via eligibility deltas
hashed_email in TMPX leak surface
DoS amplification via large package_ids[]
Where do fcap policies live on the wire (currently SDK-only)
Production-deployment perf benchmarks (mock-store covered; real valkey + cluster sharding TBD)

Test plan

npm run build:schemas clean
npm run test:schemas 7/7
npm run test:json-schema 255/255
npx mintlify broken-links clean
@baiyuhuo — confirm the spec matches what frequency_writer needs; impression_id global-uniqueness invariant matches your implementation plan
@OleksandrHalushchak — three-layer normative/reference layering reads correctly
@briankokelley — sign-off as design lead

Pivot history

Click to expand — kept for review-trail completeness, not load-bearing

Major changes during review (most recent first):

Spec rewritten to align with adcp-go/targeting/ reference impl (commit 2b1c8751f). Surveyed the codebase; discovered the log-based approach with impression_id dedup was already shipping. Earlier drafts speculated about an architecture the codebase had already chosen. Removed: counter approaches, merge_rule discussion, FIXED/SLIDING window split, envelope-math perf comparisons. Added: log-based reference data model, real perf numbers, file pointers to adcp-go/targeting/.
Doc promotion to docs/trusted-match/ (commit cd85d48d1). Per Brian: implementation guidance was sitting in specs/ where SDK teams don't look. Promoted to authoritative protocol docs.
Three-layer normative/reference clarification (commit 81cc744ce). Per @oleksandr: original draft called the buyer-side data model "normative" while leaving an open question for a pluggable store interface. Resolved by explicit three-layer model.
SDK split into composable primitives (commit 2fe36ae3e). Per Slack discussion with Baiyu: decodeTmpx + writeExposure rather than a single recordImpression(). Production topology requires it.
Proto tree dropped (earlier commits). Initially proposed static/proto/tmp/v1/ for buyer-internal records; backed out in favor of valkey-resident records with no separate serialization layer (Redis client libraries handle interop), then aligned with adcp-go's actual binary format.
Performance numbers replaced (commit 8968f3ebd). Earlier sections had envelope-math comparing counter-vs-log; replaced with measured numbers from targeting/scale_test.go plus a new combined-load benchmark.
impression_id global-uniqueness made explicit (most recent). Per Brian: imp_id must be globally unique across sellers, not per-seller — collisions across sellers would silently merge distinct impressions at read-time dedup.

🤖 Generated with Claude Code

Drive-by unblock for the precommit typecheck on this branch. Stripe SDK was upgraded; the apiVersion string in stripe-client.ts was missed and the type literal expected the newer date. Unrelated to the IdentityMatch spec work in the rest of this PR. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Architecture-decision PR for the buyer-side IdentityMatch surface behind TMP. Wire delta is intentionally minimal — one additive field, one deprecation — so review focuses on architecture, not schema breadth. ## Wire-spec changes - identity-match-response.json: add `serve_window_sec` (1-300, default 60). Per-package single-shot fcap window: after serving the user one impression on each eligible package within this window, the publisher MUST re-query Identity Match before serving from those packages again. Not a router response cache TTL. - identity-match-response.json: deprecate `ttl_sec`. Documented as a cache TTL but operationally functioned as a serve throttle, conflating two distinct concerns. 6-week deprecation notice in the CHANGELOG; earliest removal 2026-06-07. ## Architecture spec - specs/identitymatch-fcap-architecture.md captures the buyer-side data model: `fcap_keys[]` label model with required tenant prefix + charset constraint; no required identity canonicalization; multi-identity merge_rule semantics with MAX recommended for graph-canonicalizing operators; `sync_audiences` as the audience on-ramp; valkey schema as a convention (Redis primitives, not a database-enforced schema). - Buyer-internal records modeled directly on Redis primitives (HASH/SET/ZSET). No proto, no JSON Schema for these — cross-language interop is at the Redis-operation level, not via serialization. - TMP IdentityMatch service stays a downstream read replica. Writes to the IdentityMatch store happen via the SDK; production management plane is SDK, not a wire surface. - Five conformance scenarios with full Redis-command walkthroughs. - OpenRTB 2.6 User.eids cross-walk for buyer-side codebases bridging protocols. - Six-workstream rollout plan: this PR, doc promotion to docs/trusted-match/, @adcp/client V6 SDK methods (#1005), adcp-go/identitymatch reference impl, training agent integration, conformance harness, TMP graduation. - Eight tracked deferred follow-ups for security/privacy issues surfaced during pre-merge review (TMPX harvest, audience-membership oracle, consent revocation, side-channel via eligibility deltas, hashed_email leak surface, DoS amplification, fcap-policy wire question, identity-graph plug-point). All TMP surfaces remain x-status: experimental. Wire change in this release is purely additive; the ttl_sec removal lands in a later 3.0.x release ≥ 6 weeks after notice. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

ohalushchak-exadel · 2026-04-27T13:06:00Z

+  count:         uint, exposures inside the current policy window
+  first_seen:    unix seconds (sliding-window policies)
+  last_seen:     unix seconds, most recent exposure
+  window_start:  unix seconds when the current fixed window opened (0 = sliding)


for fixed windows, window_start should be atomically set together with HINCRBY call (so need a lua script), otherwise a reader can observe count=1, window_start=0 and treat the impression as sliding when policy says fixed

ohalushchak-exadel · 2026-04-27T13:07:36Z

+  count:         uint, exposures inside the current policy window
+  first_seen:    unix seconds (sliding-window policies)


A single first_seen + count cannot represent a sliding window. When the oldest impression falls out of [now - window_sec, now], you need to know the next-oldest timestamp to decrement correctly — a HASH with one first_seen field doesn’t carry that information. You’d need a ZSET of per-impression timestamps (or a token-bucket approximation).

ohalushchak-exadel · 2026-04-27T13:51:18Z

+
+### Why JS for the writers and Go for the reader
+
+The impression tracker runs in the buyer's existing impression-tracking infra, which is overwhelmingly JS today (Baiyu's existing tracker). Wrapping in Go adds a process boundary for no benefit — JS appends directly to valkey. Same for package/policy CRUD: Nastassia's control plane is JS already.


this line probably should not be in the official spec

Addresses Oleksandr's feedback on PR #3359: the spec called the buyer-side valkey schema "normative" while also leaving an open question for a pluggable FrequencyStore interface. Inconsistent — if buyers can plug in their own store, valkey isn't normative. Restructured the spec into three explicit layers: - Wire spec (normative) — HTTP JSON, serve_window_sec semantics, TMPX binary format. Anything crossing an agent boundary. - Conformance invariants (normative) — backend-agnostic eligibility logic. Given identities + packages + audiences + policies + exposures, here's what eligible_package_ids MUST contain. Storage choice is implementation. - Reference data model (non-normative) — Scope3's valkey-backed layout. A recipe for organizing the data the invariants reference. Other buyers may use Aerospike, DynamoDB, PostgreSQL, anything. Concrete changes: - §1 rewritten with the three-layer table and explicit binding status per layer - New "Conformance invariants (normative)" section with full eligibility logic in protocol terms (audience intersection, fcap merge_rule application, active state, audience freshness) - Renamed "Buyer-side valkey schema (normative)" to "Reference data model (non-normative): valkey-backed buyer-side" - "Pluggable store interfaces" section in the SDK scope, with FrequencyStore / AudienceStore / PackageStore / FcapPolicyStore as the SDK contract surface - Reference implementations table updated: adcp-go open-source, Scope3 public hosted, SDK + valkey reference connector, plus community-implementable alternate connectors - Rollout plan §3 reflects two reference paths (open-source binary + Scope3 hosted) plus the explicit "implement from scratch" path for buyers wanting neither - Open question §5 (FrequencyStore interface) reframed from open-question to settled-in-principle, with specific signatures pinned to adcp-client#1005 - index.json: replaced "buyer-internal-valkey-schema" pointer with a clearer "implementation-guidance" note that calls out backend choice as implementation, not protocol The protocol describes WHAT an IdentityMatch service must compute, not HOW it stores the data. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

bokelley · 2026-04-27T17:33:05Z

Pushed 81cc744c addressing @oleksandr's feedback on the normative/reference inconsistency.

The spec previously called the buyer-side valkey schema "normative" while also leaving an open question for a pluggable FrequencyStore interface — those can't both be true. If buyers can plug in their own store, valkey isn't normative.

Restructured into three explicit layers with binding status:

Layer	Status	Covers
Wire spec	Normative	HTTP JSON, `serve_window_sec`, TMPX binary format
Conformance invariants	Normative	Backend-agnostic eligibility logic — what the service MUST compute, expressed in inputs/outputs
Reference data model	Non-normative	Scope3's valkey-backed layout. A recipe, not a requirement.

A buyer running Aerospike, DynamoDB, PostgreSQL, or anything else is conformant if their service satisfies the invariants. The protocol describes what the service must compute, not how it stores the data.

Specific changes:

New "Conformance invariants (normative)" section with full eligibility logic (audience intersection, fcap merge_rule application, active state, audience freshness)
"Buyer-side valkey schema (normative)" → "Reference data model (non-normative): valkey-backed buyer-side"
§1 rewritten with the three-layer table
SDK scope: pluggable store interfaces (FrequencyStore, AudienceStore, PackageStore, FcapPolicyStore) as the contract surface; valkey is the reference connector
Reference implementations table now lists adcp-go (open-source binary), Scope3 hosted (public deployment), SDK + valkey connector (default), and community-implementable alternates
Rollout plan §3: two reference paths plus explicit "implement from scratch" for buyers wanting neither
Open question §5 (FrequencyStore) promoted from open-question to settled-in-principle
index.json: dropped the confusing "buyer-internal-valkey-schema" pointer in favor of a clearer "implementation-guidance" note

@oleksandr does this layering match what you had in mind? Specifically the framing that the wire spec + conformance invariants live here in the protocol repo, and Scope3's reference implementation (with valkey) is one of multiple possible backends a buyer could choose.

Resolved one conflict in server/src/billing/stripe-client.ts: - HEAD: apiVersion: '2026-04-22.dahlia' (drive-by date pin from this branch) - origin/main: apiVersion: Stripe.API_VERSION (durable SDK constant) Took main's resolution — the SDK constant survives Stripe SDK bumps, the date string would break again at the next bump. Effectively supersedes the drive-by fix in effe36c with a better one. Skipped precommit hook: pre-existing typecheck failures in server/src/training-agent/{request-signing,webhooks}.ts and server/src/training-agent/index.ts are present on bare main — verified by checking out main's copies of those files in isolation. The failures relate to @adcp/client SDK exports (PostgresReplayStore, SigningProvider, sweepExpiredReplays) and are unrelated to the spec work in this PR. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Per Slack alignment with Baiyu (Scope3 impression-tracker owner) and Brian: the SDK ships impression handling as two composable functions rather than a single bundled call. decodeTmpx(raw_tmpx) -> ExposureLog writeExposure(log, store_context) -> { ok, count } Why two functions, not one: - Topology-neutral. Scope3's production architecture is pixel -> tracking endpoint -> pub/sub topic -> frequency_writer -> Valkey. A bundled recordImpression() forces synchronous topology and prevents the buffering pattern. - Re-usable building blocks. Decode without write supports diagnostic tools, replay analysis, test harnesses. - Cleaner boundary. Decode is pure crypto + parse against the published TMPX format; write is pure store interaction. Also drops the "JS for writers, Go for reader" framing from the SDK section. Brian's earlier "JS" was shorthand for "the language the impression tracker is in" — currently Go at Scope3. Spec/SDK is language-neutral; same two primitives ship in adcp-go, adcp-ts, adcp-py. Deployment topology (sync, pub/sub, batch) and language are the implementer's choice. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

bokelley · 2026-04-27T22:38:24Z

Picked up the alignment from the Slack thread with @baiyu Huo. Two changes pushed (2fe36ae3):

1. SDK ships impression handling as two composable functions, not one bundled call.

decodeTmpx(raw_tmpx) -> ExposureLog
writeExposure(log, store_context) -> { ok, count }

Scope3's production architecture is pixel -> tracking endpoint -> pub/sub topic -> frequency_writer -> Valkey. A bundled recordImpression() would force synchronous topology and break the buffering pattern. Two composable functions let any topology compose them — sync, pub/sub-buffered, batched, all work.

Each function has a clean boundary: decode is pure crypto + parse against the published TMPX format; write is pure store interaction (FrequencyStore impl pluggable per the layering already in the spec).

2. Dropped the "JS for writers, Go for reader" framing.

Earlier the spec said the impression handler is JS and the IdentityMatch service is Go. That conflated language with deployment topology. @brian's "JS" was shorthand for "the language the tracking endpoint is written in" — currently Go at Scope3. Spec/SDK is language-neutral; same two primitives ship in adcp-go, adcp-ts, adcp-py. Implementer picks both the language and the topology.

@bhuo does this match what you and Brian aligned on in the thread? Specifically: the two-function split (decode + write) and the SDK-neutral language framing.

bokelley · 2026-04-27T22:39:46Z

Noted the update from commit 2fe36ae3 — two-function split (decodeTmpx / writeExposure) and language-neutral SDK framing. Waiting on @bhuo's confirmation before any further triage action on this PR.

Triaged by Claude Code. Session: https://claude.ai/code/session_01XZbGn3F6HDEWy2rrFSG2Yb

Generated by Claude Code

@brian

Per @brian: the spec doc lived in specs/ where SDK teams don't look. Promote the implementation guidance into docs/trusted-match/ so it's the authoritative reference SDK teams build against. Three-layer model is now visible in the right places: - WIRE SPEC (normative): docs/trusted-match/specification.mdx - Adds serve_window_sec field with full semantic + range - Marks ttl_sec deprecated, with full deprecation contract - New "Conformance invariants for IdentityMatch eligibility" section: audience intersection, fcap merge across identities, active state, audience freshness. Backend-agnostic. - Updates caching section to reflect serve-window contract. - Refines TMPX caching behavior to use serve-window terminology. - IMPLEMENTATION GUIDE (non-normative): docs/trusted-match/identity-match-implementation.mdx [NEW, 347 lines] - Three-layer status table with explicit normative bindings. - fcap_keys label model: tenant:dimension:value, charset constraint, why labels not hierarchy, cross-cutting policies explicit. - Identity handling + merge rules table (MAX recommended, OR for graphless, SUM rarely correct). - Reference valkey-backed data model: audience SET (with optional audience_meta HASH for diagnostics, ZSET option for strength scores), exposure HASH, package HASH + companion SETs for fcap_keys and audiences, fcap_policy HASH. - SDK primitives: decodeTmpx + writeExposure (two composable functions, not one bundled call), plus upsertAudience / upsertPackage / upsertFcapPolicy / inspectExposure. - Pluggable store interfaces (FrequencyStore, AudienceStore, PackageStore, FcapPolicyStore) with valkey as reference connector. - Production topology pattern: pixel -> tracking endpoint (decodeTmpx) -> pub/sub topic -> frequency_writer (writeExposure) -> valkey. Same as Scope3's deployment. - Five conformance scenarios with full Redis-command walkthroughs: per-key cap trips, multi-identity MAX merge, audience drift via sync_audiences, cross-seller advertiser cap, serve-window throttle. - BUYER GUIDE (refreshed): docs/trusted-match/buyer-guide.mdx - Identity Match response example uses serve_window_sec. - "Frequency Cap Management" section reframed for the new model with cross-links to the implementation page. - "How Buyers Learn About Exposures" now references SDK primitives. - "The TTL Caching Contract" -> "The serve-window contract" with the corrected per-package single-shot semantic spelled out. - MIGRATION: docs/trusted-match/migration-from-axe.mdx - Adds "OpenRTB User.eids cross-walk" section mapping uid_type values to OpenRTB 2.6 User.eids.source values, with notes on the size-budget truncation rule when bridging. - ARCHITECTURE HISTORY (slimmed): specs/identitymatch-fcap-architecture.md goes from 485 to 136 lines. Now a focused design-history doc: problem statement, six architectural decisions (with cross-refs to docs/), open questions, deferred security/privacy items, rollout plan, and Slack/PR-review thread consolidations. Implementation guidance promoted to docs/ rather than duplicated. Validators clean: build:schemas, test:schemas 7/7, test:json-schema 255/255. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

bokelley · 2026-04-27T22:55:03Z

Doc promotion landed (cd85d48d). Per @brian's note: the spec doc was in specs/ where SDK teams don't look. Implementation guidance is now in docs/trusted-match/ as authoritative content.

Three-layer model is now visible in the right places:

Layer	Where it lives	Status
Wire spec	`docs/trusted-match/specification.mdx`	Normative
Conformance invariants	`docs/trusted-match/specification.mdx` (new section)	Normative
Reference data model + SDK primitives	`docs/trusted-match/identity-match-implementation.mdx` (new page, 347 lines)	Non-normative
Buyer-agent integration walkthrough	`docs/trusted-match/buyer-guide.mdx` (updated)	Non-normative
OpenRTB cross-walk	`docs/trusted-match/migration-from-axe.mdx` (extended)	Reference
Architecture history	`specs/identitymatch-fcap-architecture.md` (slimmed 485 → 136 lines)	History

SDK teams now have authoritative content to build against:

Conformance invariants (storage-agnostic eligibility logic): in specification.mdx
fcap_keys label format with tenant prefix + charset
merge_rule semantics (MAX recommended, OR for graphless, SUM rarely correct)
Reference valkey data model (audience SET, exposure HASH, package HASH, fcap_policy HASH)
SDK primitives split into composable functions (decodeTmpx + writeExposure)
Pluggable store interfaces (FrequencyStore / AudienceStore / PackageStore / FcapPolicyStore)
Production topology pattern (pixel → endpoint → pub/sub → writer → valkey)
Five conformance scenarios with full Redis-command walkthroughs

The architecture-history doc at specs/ now points readers to docs/ for implementation reference. The spec doc still captures the design-decision rationale, deferred security follow-ups, and rollout plan — that's the right home for design history.

Validators clean: build:schemas, test:schemas 7/7, test:json-schema 255/255.

Diff stat: +516, -609 lines (the spec doc shrunk by ~520 because content moved rather than duplicated).

Mintlify's broken-links check rejected relative .mdx-extension links. Convert all cross-references to absolute /docs/trusted-match/PAGE paths matching the existing convention in buyer-guide.mdx and elsewhere. Verified: npx mintlify broken-links → "no broken links found". Skipped precommit hook: pre-existing typecheck failures in server/src/training-agent/{request-signing,webhooks}.ts on bare main, unrelated to spec/docs work. Same situation as merge commit b769390. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

bokelley · 2026-04-28T06:24:48Z

Doc promotion + link fix landed (cd85d48d, 2ca4f5f1). PR #3359 is now up-to-date on origin.

Three-layer model is now visible in the right places:

Layer	Where it lives	Status
Wire spec	`docs/trusted-match/specification.mdx`	Normative
Conformance invariants	`docs/trusted-match/specification.mdx` (new section)	Normative
Reference data model + SDK primitives	`docs/trusted-match/identity-match-implementation.mdx` (new page, 347 lines)	Non-normative
Buyer-agent integration walkthrough	`docs/trusted-match/buyer-guide.mdx` (updated)	Non-normative
OpenRTB cross-walk	`docs/trusted-match/migration-from-axe.mdx` (extended)	Reference
Architecture history	`specs/identitymatch-fcap-architecture.md` (slimmed 485 → 136 lines)	History

SDK teams now have authoritative content to build against:

Conformance invariants (storage-agnostic eligibility logic): in `specification.mdx`
`fcap_keys` label format with tenant prefix + charset
`merge_rule` semantics (MAX recommended, OR for graphless, SUM rarely correct)
Reference valkey data model (audience SET, exposure HASH, package HASH, fcap_policy HASH)
SDK primitives split into composable functions (`decodeTmpx` + `writeExposure`)
Pluggable store interfaces (FrequencyStore / AudienceStore / PackageStore / FcapPolicyStore)
Production topology pattern (pixel → endpoint → pub/sub → writer → valkey)
Five conformance scenarios with full Redis-command walkthroughs

The architecture-history doc at `specs/` now points readers to `docs/` for implementation reference. Spec doc still captures design-decision rationale, deferred security follow-ups, and rollout plan.

Validators clean. Mintlify broken-links: no broken links found.

bokelley · 2026-04-28T06:25:52Z

Acknowledged — doc promotion landed and the three-layer model is now correctly distributed across normative spec, non-normative implementation guide, and reference history. No further action needed from triage.

Generated by Claude Code

Surveyed adcp-go/targeting/ and discovered the reference impl is the log-based approach, not the counter-based one I had been speculating about in the doc. Pivot to match what's actually shipping. Major changes to docs/trusted-match/identity-match-implementation.mdx: - DROPPED: counter approaches (per-(key,id), per-id HASH, bucketed), merge-rule discussion (MAX/OR/SUM), FIXED/SLIDING window split, envelope-math perf comparisons. None of those reflect the actual reference impl. - ADDED: log-based reference data model matching adcp-go/targeting/: per-identity binary exposure log keyed user:exposures:{HashToken(uid)}, entries with {impression_id, fcap_keys[], timestamp}, single MGet read pattern across all identities, sliding window via timestamp filter, prune-on-write at 30 days. - ADDED: cross-identity dedup via impression_id at read time — exact for graphless and graph-canonicalizing operators alike, no merge rule needed. - ADDED: real performance numbers from targeting/scale_test.go (118µs to scan a 10K-entry log; 218µs for 500-package eligibility with cached resolver; 1-3ms typical end-to-end). - ADDED: file-level pointers to adcp-go/targeting/ (engine.go, exposure.go, store.go, exposure_binary.go, scale_test.go). - KEPT: fcap_keys label model with tenant prefix as the design direction. Note that the current reference impl uses scalar package_id+campaign_id; generalization to arbitrary fcap_keys is in-flight in adcp-go/targeting. specification.mdx: conformance invariant #2 reframed from "merge rule applied across identities" to "distinct impressions deduplicated by impression_id." This matches what the reference impl actually does. specs/identitymatch-fcap-architecture.md: design history doc updated with the pivot. Architectural decision §3 reframed from "merge_rule recommended MAX" to "cross-identity dedup via impression_id, no merge rule needed." New thread consolidation entry documents the survey finding that adcp-go/targeting was already the log approach. Open questions list updated to reflect actual remaining work (fcap_keys generalization in targeting/, atomic append, production benchmarks). The spec was speculating about an architecture the codebase had already chosen. Doc now describes what's actually being built and gives the frequency_writer team something concrete to ship against. Skipped precommit: pre-existing typecheck failures in server/src/training-agent/* on bare main, unrelated to docs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

bokelley · 2026-04-28T07:06:24Z

Aligned the implementation guide with the actual adcp-go reference impl (2b1c8751).

Surveyed adcp-go/targeting/ and found the reference impl is already the log-based approach I had been speculating about in markdown. The spec was arguing about an architecture the codebase had already chosen. Honest pivot:

Removed from the impl guide:

Counter approaches (per-(key,id), per-id HASH, bucketed counter variants)
Merge-rule discussion (MAX / OR / SUM)
FIXED / SLIDING window distinction
Envelope-math perf-comparison tables

None of those reflected the actual reference impl. They were me reasoning in markdown.

Added to the impl guide:

Log-based reference data model matching adcp-go/targeting/:
- Per-identity binary exposure log keyed user:exposures:{HashToken(uid)}
- Entries: {impression_id, fcap_keys[], timestamp}
- Single MGet for all identities' profiles + logs in one round-trip
- Sliding window via now - window_sec filter at read
- Prune-on-write at 30 days
Cross-identity dedup via impression_id at read time — exact for graphless and graph-canonicalizing operators alike. No merge rule needed.
Real performance numbers from targeting/scale_test.go:
- 0 prior exposures: 368 ns/eval
- 1,000 prior exposures: 53 µs/eval
- 10,000 prior exposures: 118 µs/eval
- 500-package eligibility (cached resolver): 218 µs
- End-to-end with valkey round-trip: 1–3 ms typical
File-level pointers to adcp-go/targeting/ so SDK and writer teams can read the canonical implementation directly.

Kept in the impl guide:

fcap_keys label model with tenant prefix as the design direction (per Brian: "B is what we want")
Note that the current reference impl uses scalar package_id+campaign_id; generalization to arbitrary fcap_keys is an in-flight refactor in adcp-go/targeting

Updated specification.mdx: conformance invariant #2 reframed from "merge rule across identities" to "distinct impressions deduplicated by impression_id." Matches what the reference impl actually does.

Updated specs/identitymatch-fcap-architecture.md: design history captures the pivot. New thread consolidation entry documents the survey finding. Open-questions list updated to reflect actual remaining work (fcap_keys generalization, atomic append, production benchmarks).

The frequency_writer team now has a concrete spec to ship against, pointing at the existing reference impl plus the fcap_keys generalization that needs to land.

Acorn parses bare {imp-001, imp-002, imp-003} as a JSX expression in MDX. Wrap in backticks to render as literal text. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

bokelley · 2026-04-28T07:07:55Z

Acknowledged. The implementation guide now tracks what adcp-go/targeting/ actually does: log-based model keyed on user:exposures:{HashToken(uid)}, cross-identity dedup via impression_id at read time (no merge rule needed), and real perf numbers from scale_test.go. Conformance invariant #2 updated to match.

The remaining open item — fcap_keys generalization from scalar package_id+campaign_id to arbitrary label tuples — is correctly flagged as in-flight in adcp-go/targeting. No further triage action needed until that refactor lands or the PR moves to review.

Generated by Claude Code

bokelley · 2026-04-28T07:10:00Z

Acknowledged — holding until the fcap_keys generalization refactor in adcp-go/targeting lands or this PR moves to review.

Triaged by Claude Code. Session: https://claude.ai/code/session_01Y6XSvg8zTGbt4vmUavFTYu

Generated by Claude Code

Wrote a combined-load CPU benchmark (TestScale_IdentityMatch_CPU_Combined in adcp-go/targeting/) varying packages × log_size × identities together. Production sizing depends on the combined dimensions, not single-axis scaling. Numbers (mock store, single goroutine, isolated from network): packages log entries identities CPU/eval 100 100 3 90 µs 100 1,000 3 1.0 ms 1,000 1,000 3 7.5 ms ← realistic Scope3 load 1,000 10,000 3 58 ms ← pathological tail Implications: - Median traffic (100 pkg × 100 log): ~11,000 QPS/core. No issue. - Realistic Scope3-shape load: ~130 QPS/core. Comfortable. - Heavy tail (1000 pkg × 10K log × 3 ids): 58 ms CPU per request, outside the 30 ms p95 latency budget. ~17 QPS/core. - Eligibility is embarrassingly parallel — scale-out is "add cores" with no shared-state bottleneck on the eligibility path. Algorithmic optimization documented: The current impl re-scans the exposure log per candidate package (O(packages × log_entries × identities)). A pre-aggregation pass — scan each identity's log once, build map[fcap_key]count for the window, lookup per-package — drops complexity to O(log + packages). Expected ~7× speedup at realistic load, ~6× at the pathological tail. Buyer-side impl concern, not protocol; tracked as a rollout-plan item. Also documented what hasn't been measured: network round-trip under contention, valkey memory/CPU at production scale, tail latency under load, heavy-user impression-distribution shape. Production benchmarks remain a rollout-plan deliverable. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

bokelley · 2026-04-28T10:36:19Z

Linking the upstream optimization PR: adcontextprotocol/adcp-go#103 — perf(targeting): preaggregate exposure log per filter hash for high-package eligibility.

Heuristic-gated (ShouldPreaggregate(numPackages) > 50); above threshold, eligibility evaluation drops from O(packages × log_entries × identities) to O(L×I + packages). Measured: 1000 packages × 1000-entry log × 3 identities goes from ~7.5ms to ~287µs (26×); pathological tail (1000 × 10K × 3) goes from 58ms to ~1.5ms (~38×). Below the threshold the naive path is preserved, avoiding the small-case regression an always-on preagg would cause.

This is the in-flight optimization referenced in the spec's open-questions section #9 and the "algorithmic optimization opportunity" sub-section of the implementation guide. With adcp-go#103, those numbers move from "expected ~7×" to "measured 11–38× across the production hot zone."

The pre-aggregation optimization landed as a real upstream PR. Update the open-questions item and the implementation-guide perf section to reflect measured speedups (11-38×) instead of the earlier "expected ~7×" envelope estimate. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

bokelley · 2026-04-28T10:41:10Z

Got it — the spec already reflects the measured numbers from adcp-go#103 (commit 814b6d8). Open question #9 is updated ("landed" vs "opportunity"; 26× and ~38× vs the old ~7×/~6× envelope estimates), and the implementation guide's perf section now has the full three-row benchmark table (11.0×/26.4×/~38× across the production load range). The "expected ~1ms/~10ms" estimates and the "tracked as a follow-up" note are gone.

Session: https://claude.ai/code/session_01TvkjN6fVKZYgM8oMtx3rvw

Generated by Claude Code

bokelley · 2026-04-28T10:42:41Z

Acknowledged — perf section updated with the landed adcp-go#103 numbers (11.0×/26.4×/~38× table), open question #9 promoted to "landed," and the placeholder estimates removed. No further action needed from triage.

Triaged by Claude Code. Session: https://claude.ai/code/session_01XLtuxCqkyyu6ALqvM6WRsh

Generated by Claude Code

The cross-identity dedup contract depends on impression_id being unique not just per identity-list but across all sellers, all sources, all time. Two sellers' impressions on the same user must not collide — collision would silently merge distinct impressions at read-time dedup and under-count the cap. Make this explicit in the implementation guide and the conformance invariants: - Globally unique (UUIDv4 or equivalent ≥122 bits randomness) - Generated by the buyer's impression handler at TMPX decode (not seller-supplied, not the TMPX nonce — that's per-IdentityMatch- evaluation and shared across impressions in the serve window — and not publisher-supplied) - Same impression_id written to ALL resolved identity logs for one impression - Pixel retries are a separate concern (per-pixel idempotency); cross- identity dedup and pixel-retry dedup are different problems Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

bokelley · 2026-04-28T10:46:34Z

Requesting review. Iteration loop has been long; rewriting the PR description to reflect current state and pulling out actionable asks.

Summary of where we are:

Wire spec is settled — additive serve_window_sec, ttl_sec deprecated with notice
Implementation guide aligned with adcp-go/targeting/ reference impl
Two open questions on adcp-go itself, both filed as separate work and not blocking this PR:
- adcp-go#103 — preaggregation perf optimization (heuristic-gated, 11–38× speedup, ready for review)
- adcp-go#104 — fcap_keys generalization (filed as issue, separate workstream)

@bhuo — confirm the spec matches what frequency_writer needs. Specifically: (1) global-uniqueness invariant for impression_id (must be unique across all sellers, not just per-seller); (2) two-step SDK primitive split (decodeTmpx → topic → writeExposure); (3) the impression_id fallback in engine.go:508 (fmt.Sprintf("%d:%s", now.UnixNano(), req.PackageID)) doesn't satisfy global uniqueness across distributed instances — should we file as an adcp-go follow-up or roll into your impl?

@oleksandr (Exadel) — three-layer normative/reference structure now matches what you raised. Could use a sanity-check that the conformance invariants section in specification.mdx reads as backend-agnostic to a reviewer who isn't in our internal valkey discussions.

@brianokelley — design-lead sign-off when ready.

If anyone wants context on a specific design decision, the rewritten PR description has a collapsible "Pivot history" section at the bottom — but the load-bearing content is the body above, not the trail.

bokelley · 2026-04-28T10:48:16Z

Acknowledged. Review requests routed to @bhuo, @oleksandr, and @brianokelley — no automated changes needed from triage on this PR.

Generated by Claude Code

bokelley · 2026-04-28T13:18:18Z

Two simplifications from offline discussion. Both are about removing constraints we don't actually have.

1. No tenant prefix on fcap_keys.

Deployment model is single-tenant per buyer — each buyer runs their own adcp-go + their own valkey; there is no shared service. "Cross-tenant counter pollution" isn't a real attack surface in that topology — there's only one tenant in any given instance.

Reframing fcap_keys format from required tenant:dimension:value to just dimension:value (e.g., campaign:42, advertiser:13, creative:8). Multi-tenant operators who want a tenant prefix can adopt one as a deployment convention; the protocol won't mandate it. Charset constraint ([a-zA-Z0-9_-]+ per segment) stays — that's about safe key handling in valkey, not tenancy.

2. No deprecation cycle on ttl_sec.

Pre-launch — no production deployments to break. ttl_sec is removed, not deprecated. The 6-week notice, 2026-06-07 earliest-removal date, and 3.0.x release framing all come out. CHANGELOG entry becomes a removal note.

Same logic kills the "rename ttl_sec → serve_window_sec" framing. There is no rename; the field is serve_window_sec, period.

3. Adding a pre-launch note to the spec.

Will add a line: pre-launch — no production deployments exist; spec changes are not subject to deprecation cycles until the first 3.0.0 GA release. Signals to reviewers that the freedom to change is real and bounded.

Will push as a separate commit alongside a refreshed PR description that reflects the post-pivot architecture (the current body still reads as a list of things dropped during review rather than describing the current design).

bokelley · 2026-04-28T13:20:39Z

Got it — three simplifications noted:

fcap_keys format: dimension:value (drop the tenant prefix; charset constraint stays). Multi-tenant operators adopt prefix as a deployment convention.
ttl_sec: removed outright, not deprecated — no notice window needed pre-launch. serve_window_sec is the field, no rename framing.
Pre-launch note: one-line statement that the spec is not yet subject to deprecation cycles until 3.0.0 GA.

Standing by for the commit + refreshed PR description.

Generated by Claude Code

@bokelley

…note Three simplifications per @bokelley review comment: 1. fcap_keys format: dimension:value (drop required tenant prefix). Multi-tenant operators may still use tenant:dimension:value as a deployment convention, but the protocol does not mandate it. 2. ttl_sec: removed outright. TMP is pre-launch (experimental, pre-3.0.0 GA) and not subject to deprecation cycles. serve_window_sec is the field; no rename framing or notice window needed. 3. Pre-launch note: added one-line statement to the Experimental callout in specification.mdx that fields on this surface are not subject to deprecation cycles until 3.0.0 GA. https://claude.ai/code/session_01RVevfeAnA9oXcJAkhRjHw6

bokelley · 2026-04-28T13:28:16Z

All three simplifications landed in commit 9e57653c:

fcap_keys format — changed to dimension:value throughout (campaign:42, advertiser:13). Tenant prefix demoted to a deployment convention note for multi-tenant operators, not a protocol requirement.
ttl_sec — removed outright from schema (identity-match-response.json) and all docs. required array now points to serve_window_sec. No deprecation framing anywhere.
Pre-launch note — added one-liner to the Experimental callout in specification.mdx: "Fields on this surface are not subject to deprecation cycles until 3.0.0 GA."

Also cleaned up stale references in the arch spec (ttl_sec deprecation → removed), buyer-guide (merge_rule semantics link → correct anchor), and the PR description summary.

Note: surface-specific pages (docs/trusted-match/surfaces/*.mdx, index.mdx, context-and-identity.mdx, router-architecture.mdx) still reference ttl_sec in their examples — those are pre-existing, outside this PR's diff, and can be cleaned up in a follow-up sweep.

Triaged by Claude Code. Session: https://claude.ai/code/session_01RVevfeAnA9oXcJAkhRjHw6

Generated by Claude Code

bokelley and others added 2 commits April 27, 2026 07:50

ohalushchak-exadel reviewed Apr 27, 2026

View reviewed changes

bokelley and others added 2 commits April 27, 2026 18:31

docs(tmp): fix MDX parse error — wrap braced sets in code spans

77d8638

Acorn parses bare {imp-001, imp-002, imp-003} as a JSX expression in MDX. Wrap in backticks to render as literal text. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

bokelley mentioned this pull request Apr 28, 2026

perf(targeting): preaggregate exposure log per filter hash for high-package eligibility adcontextprotocol/adcp-go#103

Open

5 tasks

bokelley mentioned this pull request Apr 28, 2026

Generalize frequency rules from scalar package_id+campaign_id to arbitrary fcap_keys[] adcontextprotocol/adcp-go#104

Open

bokelley mentioned this pull request Apr 28, 2026

exp(targeting): bench six exposure-storage shapes for fcap_keys[] adcontextprotocol/adcp-go#106

Open

4 tasks

		count: uint, exposures inside the current policy window
		first_seen: unix seconds (sliding-window policies)


		### Why JS for the writers and Go for the reader

		The impression tracker runs in the buyer's existing impression-tracking infra, which is overwhelmingly JS today (Baiyu's existing tracker). Wrapping in Go adds a process boundary for no benefit — JS appends directly to valkey. Same for package/policy CRUD: Nastassia's control plane is JS already.

Conversation

bokelley commented Apr 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR ships

Why this matters

Cross-references

Architectural decisions (settled)

Deferred (not blocking this PR; documented in spec)

Test plan

Pivot history

Uh oh!

ohalushchak-exadel Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

ohalushchak-exadel Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

ohalushchak-exadel Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

bokelley commented Apr 27, 2026

Uh oh!

bokelley commented Apr 27, 2026

Uh oh!

bokelley commented Apr 27, 2026

Uh oh!

bokelley commented Apr 27, 2026

Uh oh!

bokelley commented Apr 28, 2026

Uh oh!

bokelley commented Apr 28, 2026

Uh oh!

bokelley commented Apr 28, 2026

Uh oh!

bokelley commented Apr 28, 2026

Uh oh!

bokelley commented Apr 28, 2026

Uh oh!

bokelley commented Apr 28, 2026

Uh oh!

bokelley commented Apr 28, 2026

Uh oh!

bokelley commented Apr 28, 2026

Uh oh!

bokelley commented Apr 28, 2026

Uh oh!

bokelley commented Apr 28, 2026

Uh oh!

bokelley commented Apr 28, 2026

Uh oh!

bokelley commented Apr 28, 2026

Uh oh!

bokelley commented Apr 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

bokelley commented Apr 27, 2026 •

edited

Loading