Skip to content

docs(rfc): v6.0 DecisioningPlatform Python port#290

Open
bokelley wants to merge 4 commits intomainfrom
bokelley/decisioning-platform-v2-rfc
Open

docs(rfc): v6.0 DecisioningPlatform Python port#290
bokelley wants to merge 4 commits intomainfrom
bokelley/decisioning-platform-v2-rfc

Conversation

@bokelley
Copy link
Copy Markdown
Contributor

Summary

Ports the v6.0 DecisioningPlatform framework into a Python package, tracking the TypeScript scaffold landing in adcontextprotocol/adcp-client#1005. RFC reflects rounds 1-3 of feedback (TS scaffold + salesagent operational review).

Single doc, no code yet. Implementation ships after the shape is signed off here.

Adopter groups

  • Salesagent (Flask + SQLAlchemy + Pydantic 2; multi-tenant; per-adapter classes become SalesPlatform impls)
  • Innovid training-agent class (single-tenant 'singleton' resolution)
  • Greenfield Python adopters (async-everywhere FastAPI shape)

Locked decisions

These are settled after operational review unless someone surfaces a constraint we missed:

Method shape — unified hybrid: one method per tool, returns Success | TaskHandoff[Success] or raises AdcpError. No *Task dual methods. Buyer pattern-matches on response shape.

Dispatch — sync methods run via asyncio.to_thread + contextvars.copy_context() propagation; never on the event loop. serve(thread_pool_size=...) exposes pool sizing.

TaskHandoff brand — plain class with __slots__ = ("_fn",). Framework dispatches via type-identity (type(obj) is TaskHandoff). No WeakValueDictionary — Python's threat model doesn't justify the JS-side ceremony.

Task registryTaskRegistry Protocol with two v6.0 impls: InMemoryTaskRegistry (default) and SqlAlchemyTaskRegistry(engine) (salesagent + adopters with existing SQLA stack). AsyncpgTaskRegistry deferred to v6.1; the Protocol shape supports it additively when a greenfield adopter asks.

Status-change bus — server-scoped only. Three publish surfaces: ctx.publish_status_change(event) inside handlers, server.status_change.publish(event) for code holding a server, TenantRegistry.publish_status_change(tenant_id, event) for cross-tenant code. No module-level singleton.

Webhook delivery — SSRF validator + pin-and-bind delivery default-on in v6.0. DNS rebinding mitigated at request time, not just validation time. serve(webhook_client=...) allows operator override.

Idempotency — 7-day default retention (configurable). 4 MB payload cap matching task registry. Framework ships vacuum_idempotency_keys() cleanup function adopters wire into their scheduler.

Wire types — Pydantic v2 BaseModel. Env-driven extra policy default — production: 'ignore' (forward-compat), dev: 'forbid' (catch typos). Read from ADCP_ENV; safe default 'dev'.

Multi-tenancyTenantRegistry with subdomain + path-prefix routing. Per-tenant health: 'pending' | 'healthy' | 'unverified' | 'disabled'. register() lands tenants in 'pending'; resolve_by_host returns null until first JWKS validation succeeds.

Account resolution'explicit' | 'from_auth' | 'singleton' (renamed from v1's 'implicit' | 'derived' for clarity).

HTTP signatureshttp-message-signatures (woodruffw) — most actively maintained pure-Python RFC 9421 impl.

Library namingadcp-server on PyPI. Same ADCP_VERSION pin as @adcp/client.

Python versions — 3.10 minimum; CI 3.10 / 3.11 / 3.12 / 3.13. PEP 696 (TypeVar defaults) via typing_extensions on 3.10-3.12.

Spec consolidation (adcp#3392) — hybrid handoff in v6.0 only on create_media_buy + sync_creatives (the two tools whose xxx-response.json includes the Submitted arm). Other 4 HITL-eligible tools surface lifecycle via publish_status_change until #3392 lands.

Open for discussion

The locked decisions are settled but adopter-specific wiring isn't. Comment from anyone planning to use this:

  • Innovid migration shape. Single-tenant 'singleton' adopter — does the training-agent example match what your migration looks like? Anything missing in the resolver/auth_info threading?
  • Greenfield non-SQLA adopters. The asyncpg deferral is comfortable because we don't know of one. If you're planning to ship a Python adopter on a non-SQLA async stack (asyncpg directly, Tortoise, SQLModel, etc.), comment so we can prioritize the v6.1 impl.
  • update_media_buy HITL workaround. Until adcp#3392 lands, re-approval flows return UpdateMediaBuySuccess synchronously with status='pending_approval' and drive lifecycle via publish_status_change. If your re-approval workflow can't fit that shape, comment with the constraint.
  • Migration timeline. Salesagent estimates 2-3 months calendar for the full migration in 4 stages. If you're a smaller adopter, helpful to hear what your timeline actually looks like once you've read the migration section.

Out of scope

  • Per-adopter migration of GAM / Kevel / scope3 / Innovid adapters
  • MCP Resources subscription wire projection (parked behind AdCP 3.1)
  • New protocol shapes (this RFC adds zero wire surface beyond AdCP 3.0 GA)

Test plan

  • Adopter teams (salesagent, Innovid) review and confirm the shape works for their migration
  • No design-level objections from greenfield adopters surveyed via this thread
  • Once locked, follow-on PRs ship: framework primitives → 12 Protocol classes → tenant_registry → task_registry impls → worked examples → CI parity tests

🤖 Generated with Claude Code

bokelley and others added 4 commits April 28, 2026 12:44
Ports the v6.0 framework primitives — server factory, dispatch seam,
idempotency, signing, validation, sandbox, status-change projection,
lifecycle observability — into a Python package targeting salesagent
(Flask + SQLAlchemy multi-tenant) and Innovid training-agent
(single-tenant) as primary adopters.

Tracks the TS scaffold landing in adcontextprotocol/adcp-client#1005;
this RFC reflects rounds 1-3 of feedback (TS scaffold + salesagent
operational review). Locked decisions cover sync-method dispatch via
asyncio.to_thread, TaskHandoff brand simplification, TaskRegistry
Protocol with InMemory + SQLAlchemy v6.0 impls (asyncpg deferred to
v6.1), pin-and-bind webhook delivery default-on, server-scoped
status-change bus, and env-driven Pydantic extra policy.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Addresses the 32 findings from the 7-expert review (protocol, adtech-product,
python, security, agentic-product, dx, javascript-protocol) on PR #290:

Wire-correctness:
- Drop status='pending_approval' from update_media_buy workaround (not in
  MediaBuyStatus enum); recommend omitting the optional status field instead
- Add operation_id to WebhookPayload (required by mcp-webhook-payload.json)
- Add A2A delivery section (Task + TaskStatusUpdateEvent) so hybrid handoff
  isn't silently MCP-only
- Drop hardcoded SQL CHECK on task status; validate at the Python boundary
  so spec evolution doesn't require migration

Security:
- Fix pin-and-bind webhook transport: preserve TLS SNI on host rewrite,
  reject redirects, check all getaddrinfo answers, reject disallowed ports,
  handle IPv6 zone IDs
- Add WebhookTransport Protocol so operator overrides can't silently bypass
  SSRF defenses
- Validate cross-tenant event.account_id belongs to tenant_id before
  forwarding (closes MCP Resources subscriber leak)
- Singleton-mode AccountStore synthesizes per-principal Account.id (closes
  buyer-to-buyer idempotency-cache leak)
- Flip extra='forbid' default in all environments; ADCP_FORWARD_COMPAT
  is opt-in for spec-rev rollouts
- Tenant-scope JWKS resolver: (tenant_id, key_id) -> jwk; reject keys
  outside the active tenant's JWKS
- Pin RFC 9421 covered components (@method, @target-uri, @authority,
  content-digest, created, expires); 60s skew; nonce cache for
  non-mutating tools

Python correctness:
- Switch TenantConfig from TypedDict+Generic (TypeError on 3.10) to
  dataclass(slots=True)
- Drop contextvars.copy_context().run(**kwargs) ceremony — to_thread
  already snapshots context, and Context.run rejects kwargs
- Install custom ThreadPoolExecutor via loop.set_default_executor for
  serve(thread_pool_size=...) to actually take effect
- Lock StatusChangeBus._subscribers under threading.Lock; log error
  type only to avoid leaking tenant data through warning lines

Cross-language pins:
- Pin idempotency keying tuple to (idempotency_key, account_id,
  tool_name, sha256(canonical_json(body))); same-key-different-body
  returns INVALID_REQUEST, not silent-replay
- Source account-resolution rename to TS PR #1005 with conversion table

Adopter shape:
- Track adcp#3392 as v6.1 release blocker, not deferrable; document
  tasks/submit projection as alternate path if spec consolidation stalls
- Split TenantHealth into orthogonal verification + operator_gate axes
- Ship DbBackedStatusChangeBus for audit-relevant deployments; label
  in-memory variant dev-only
- Pick separate Alembic version_table='adcp_alembic' (no migration-tree
  merging)
- Drop tenant-prefix workaround for missing composite PK; use
  PK (account_id, task_id) instead

DX scaffolding:
- examples/hello_seller.py reference (runnable 30-line file)
- Consolidated serve() configuration reference table
- Decision tables for account-resolution mode + publish_status_change surface
- Type aliases MaybeAsync[T] and SalesResult[T] replace inline four-way
  unions for coding-agent legibility
- Full-signature Protocol stubs (no ...-bodied placeholders) with
  per-method specialism gating in docstrings
- adcp_server.testing.make_test_context fixture
- adcp_server.dev.JwksFixture for local-dev signed-request testing
- TenantResolution dataclass replaces tuple-index [2] return

Decision summary, "what changed since v1" table, and validation matrix
updated to reflect all of the above.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Call out RFC 8785 JCS canonicalization as load-bearing implementation
  work (~3-4 days); add tests/test_jcs_parity.py for byte-equal hash
  parity between Python and TypeScript
- Split webhook tests into MCP and A2A arms in the validation matrix so
  both envelope shapes are covered against TS golden files
- Promote outbox-poll as DbBackedStatusChangeBus default (works on PG +
  MySQL + SQLite); LISTEN/NOTIFY is a Postgres opt-in optimization
- Document the buyer-side polling visibility cost of the
  update_media_buy re-approval workaround: poll-only buyers see no
  status diff while the update is in-flight (push-only via subs until
  adcp#3392)
- Update effort estimate: ~18-21 weeks focused engineering for the
  framework, ~6-8 months runway to "salesagent on v6.0 in production"
- Add operator runbook for the extra='forbid' default: how to detect
  and respond to INVALID_REQUEST spikes during spec-rev rollouts via
  ADCP_FORWARD_COMPAT=permissive

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The previous draft proposed a separate adcp-server package on PyPI,
written before checking the existing Python package state. The adcp
package at v4.0.0 already ships:

- RFC 9421 signing (adcp.signing.signer/verifier/jws/digest)
- JCS canonicalization (adcp.signing.canonical, backed by rfc8785)
- IP-pinned transport (adcp.signing.ip_pinned_transport)
- JWKS resolver (adcp.signing.jwks)
- Replay protection (adcp.signing.replay)
- Idempotency middleware (adcp._idempotency, adcp.server.idempotency/)
- Generated wire types from schemas/cache/3.0.0/ (adcp.types)
- MCP + A2A transports (adcp.server.mcp_tools, adcp.server.a2a_server)
- Existing ADCPHandler class-pattern + serve() entry point

A separate package would duplicate half the foundation. Reframe v6.0
DecisioningPlatform as a successor pattern to ADCPHandler that lands at
adcp.decisioning.* inside the existing package, reusing the primitives
above and adding only what's genuinely new (Protocol-driven specialism
shape, TaskHandoff hybrid return, multi-tenant primitives, HITL task
registry, status-change bus, validate_platform).

Substantive changes:
- New "Existing adcp Python package" + "Module path" + "Reuse vs.
  build" sections in Background
- All adcp_server.* import paths replaced with adcp.* / adcp.decisioning.*
- Open question 3 reframed: in-package landing recommended; rejection
  of separate adcp-server package with three reasons
- Decision summary item 11: packaging is in-package at v5.0.0
- "What changed since v1" gains a row for the packaging reframe
- Stage 1 of salesagent migration: bump adcp pin, no install of new
  package; existing adcp.signing / adcp._idempotency primitives carry
  over unchanged
- Effort estimate compresses from ~18-21 weeks to ~10-13 weeks
  (audit-and-fix existing primitives + add new layers); calendar
  ~3-4 months → total runway ~4-6 months
- Next moves: 6 audit issues against existing modules + 12 new-layer
  issues, replacing the prior "create new repo" step

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant