Skip to content

ci: run AdCP storyboard suite against examples/seller_agent.py on every PR #305

@bokelley

Description

@bokelley

Context

The SDK's reference seller agent (`examples/seller_agent.py`) is the load-bearing demonstration that `adcp.server.serve()` actually emits compliant AdCP responses end-to-end. Every PR can subtly regress storyboard outcomes (response shape, status enums, error codes, capability declarations) without breaking any unit test or import smoke check — and we just hit this in #295/#296 where the SDK transport defaults silently broke the runner for an unknown amount of time before anyone noticed.

`.github/workflows/ci.yml` today runs:

  • `pytest` on Python 3.10–3.13
  • `ruff` + `mypy`
  • PgReplayStore tests
  • Public-API import smoke
  • Schema regen + drift check

Nothing exercises the storyboard runner. Storyboard regressions slip silently until a downstream user notices and files an issue (e.g. #295).

Proposal

Add a CI job that:

  1. Boots `examples/seller_agent.py` on a free port (use `ADCP_PORT`).
  2. Runs `npx -y -p @adcp/client@latest adcp storyboard run http://localhost:\$PORT/mcp media_buy_seller --json --allow-http` against it.
  3. Captures the JSON, asserts `overall_status: pass` and `controller_detected: true`.
  4. Uploads the JSON as a CI artifact for diagnosis on failure.

Sketch

```yaml
storyboard:
name: AdCP storyboard runner — examples/seller_agent.py
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with: { python-version: "3.12" }
- uses: actions/setup-node@v4
with: { node-version: "22" }
- run: pip install -e ".[dev]"
- name: Start seller agent
run: |
ADCP_PORT=3001 python examples/seller_agent.py &
until lsof -iTCP:3001 -sTCP:LISTEN >/dev/null 2>&1; do sleep 0.5; done
- name: Run storyboard suite
run: |
npx -y -p @adcp/client@latest adcp storyboard run \
http://localhost:3001/mcp media_buy_seller \
--json --allow-http \
> storyboard-result.json
- name: Assert pass
run: |
python -c "
import json, sys
d = json.load(open('storyboard-result.json'))
if d.get('overall_status') != 'pass':
print(json.dumps({k: d[k] for k in ('overall_status','summary','failures')}, indent=2))
sys.exit(1)
if not d.get('controller_detected'):
print('controller_detected was false; check DemoStore overrides')
sys.exit(1)
"
- if: always()
uses: actions/upload-artifact@v4
with:
name: storyboard-result
path: storyboard-result.json
```

Sequencing

This job will fail today against `examples/seller_agent.py` on `main` (10 step failures + `controller_detected: false` — see #304). Two ways to land:

  1. Mark as required only after the seller-agent gaps in examples/seller_agent.py: storyboard content gaps exposed by #296 transport fix #304 are fixed — green from the start.
  2. Land it as informational (non-blocking), fix the seller agent in parallel via examples/seller_agent.py: storyboard content gaps exposed by #296 transport fix #304, then promote to required once green.

Option 2 surfaces the regressions immediately and creates pressure to close the gaps.

Out of scope

  • Running storyboards against `examples/minimal_sales_agent.py` (smaller surface; `media_buy_seller` storyboard doesn't apply)
  • Other compliance suites (creative agent, signals agent) — same pattern, separate jobs once the media_buy one is green.
  • The Python equivalent of the JS storyboard runner (none exists today; `@adcp/client` npm package is the canonical runner).

References

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions