ADR-023: pi-agent harness and wasmtime/c2w containers by wmeddie · Pull Request #100 · XpressAI/xpressclaw

wmeddie · 2026-04-20T09:39:35Z

Summary

Draft PR for ADR-023, which sets the direction for a clean re-do of the container-isolation work after spike/wanix-agents (PR #98) drifted out of scope.

The ADR reframes Xpressclaw as an explicit meta-harness, locks in wasmtime + container2wasm as the sole container runtime (embedded like llama-cpp), removes Docker support entirely (supersedes ADR-003 in full), distributes harness images via GHCR, and makes the sidecar the single LLM endpoint for every harness so budget-aware transparent downgrade to local inference is architectural, not per-harness.

V1 ships only two harnesses — PiHarness (new) and a retrofitted built-in Claude-SDK harness — with seven concrete MVP exit criteria so the re-do has an anti-drift lever.

What this PR is not

Not code. This is the ADR only. Implementation lands in subsequent PRs once the direction is approved.
Not a replacement for PR spike: replace Docker with Wanix + server-side agentic loop #98 yet. spike: replace Docker with Wanix + server-side agentic loop #98 stays draft until this direction is either merged or rejected.

Key decisions captured

WASM runtime: wasmtime (c2w's first-class host, Rust-native, WASIp2 stable, epoch interruption as the rollback primitive)
Container runtime: container2wasm (c2w), no Docker fallback
Image distribution: GHCR (ghcr.io/xpressai/harnesses/*)
LLM routing: all harnesses talk to the sidecar only; sidecar owns provider keys and budget routing
First-class harnesses in V1: pi + retrofitted built-in; codex/opencode deferred
Supersedes ADR-003 in full; touches ADR-002, ADR-005, ADR-010, ADR-013, ADR-017

Review focus

Does the MVP exit criteria (7 items) actually capture "direction validated"?
Is there anything I should have made a non-negotiable that's listed as an open question, or vice versa?
Pi's license posture and the shell-bridge streaming question are flagged as open — if you already know the answers, tell me and I'll fold them in before merging.

Test plan

Read ADR end-to-end
Confirm MVP criteria are the right proof points
Sanity-check "supersedes ADR-003 in full" against any Docker-dependent workflows we currently support

Establishes the direction for the next spike attempt after spike/wanix-agents drifted out of scope. Xpressclaw reframes as an explicit meta-harness, Docker is removed in full (supersedes ADR-003), c2w on wasmtime becomes the sole container runtime embedded like llama-cpp, GHCR is the image distribution channel, and the sidecar becomes the single LLM endpoint for every harness so budget-aware transparent downgrade to local inference is an architectural property. V1 scope is two harnesses (pi + retrofitted built-in) and seven MVP exit criteria, so we have something to hold the next attempt accountable to.

First slice of the ADR-023 meta-harness refactor. Defines the `Harness` trait in `xpressclaw-core::harness`, implements it for the existing `DockerManager` so the rest of the codebase gains a trait-object path without any behavior change, and exposes an `AppState::harness()` accessor alongside the existing `AppState::docker()`. The trait is deliberately narrow — only the lifecycle + endpoint + observability surface the reconciler, task dispatcher, and message processor actually consume. Image management, snapshotting, and tmux-attach ship in follow-up commits (tasks 2/3/8/9). No callers migrated yet. This is scaffolding; subsequent commits in this task swap `AppState::docker()` users over to `AppState::harness()` one module at a time.

Replaces two direct DockerManager::connect() sites with trait-object calls via state.harness() and endpoint_port(agent_id). Removes two per-request fresh Docker connections — the harness() accessor reuses the shared cached connection. This also proves the Harness trait in hot paths (message streaming + cancel) before landing the c2w implementation. Reconciler, task dispatcher, and apps routes stay on DockerManager for now; they migrate once C2wHarness exists and the switch has architectural payoff instead of being a mechanical rename.

Embeds wasmtime 27 + wasmtime-wasi as workspace deps and adds a new xpressclaw_core::c2w module with the low-level runtime primitive that task 3 (C2wHarness) will compose with. C2wRuntime wraps wasmtime Engine with: - async_support for tokio-driven guests - epoch_interruption enabled; a background Tokio task ticks the engine epoch counter every 50ms so guests can be deadline-aborted (this is the rollback primitive) - backtrace capture for diagnostics InstanceSpec + C2wInstance model the per-agent guest: env, preopened dirs (for the host-guest filesystem bridge), WASI args, optional epoch deadline. MVP exposes run_to_completion only — long-lived instances with an HTTP endpoint land in task 3. Target WASI version is preview 1 because c2w emits preview-1 modules today. If c2w adopts preview 2 we switch; the module layering is designed to isolate that choice. Tests: unit test confirms the runtime constructs and rejects bogus module bytes. End-to-end (running an actual c2w-compiled image) requires having an image built, which lands in task 3.

Adds `C2wHarness` — a `Harness` implementation backed by the `C2wRuntime` primitive from task 2 — and a `xpressclaw c2w-smoke` subcommand that exercises the full lifecycle on a real machine. C2wHarness scope: - Per-agent tokio tasks drive each guest to completion (or forever, for long-lived harnesses). - Lifecycle: launch/stop/stop_all/list/is_running/uptime_secs work. - `ContainerSpec.image` is interpreted as a filesystem path to a prebuilt WASM module — GHCR pulling + OCI-to-WASM conversion ships in task 4 without changing call sites. - Stop aborts the driver task (epoch deadline would be ≤ 50ms latency otherwise). Deferred to follow-up commits in this series: - Real stdout/stderr capture into the returned Harness::logs buffer. Right now guests inherit the process stdio; `logs()` returns empty strings. Requires threading MemoryOutputPipe through C2wInstance; minor refactor of the primitive's API so lands cleanly as 3b. - Host-side port exposure (endpoint_port returns None). Requires wasi-sockets plumbing that only becomes real with a HTTP-serving harness image — ships with task 4 alongside PiHarness. The `xpressclaw c2w-smoke` subcommand builds a noop WASI guest from embedded WAT and launches it through C2wHarness end-to-end. This is the first thing a developer can actually run to verify the runtime works on their machine. Subcommand is intentionally temporary — task 4 adds `xpressclaw harness add/run` that subsumes this with a real workload. Non-obvious bug fixed while writing this: wasmtime's epoch deadline is absolute, not relative, and the background tick driver advances the engine's counter from runtime construction. A store-default deadline of 0 therefore traps the first time the counter ticks past 0 — basically immediately. Fixed by setting a huge default deadline (u64::MAX/2 ≈ 14 billion years at 50ms/tick); caller-specified deadlines land as a follow-up when snapshot/rollback (task 8) needs precise per-step budgets. Tests: c2w unit test (runtime smoke) + harness unit test (launch-and-run noop wasm) both pass. Manual verification via the c2w-smoke CLI also passes end-to-end on macOS.

Adds the pi-agent harness layer on top of C2wHarness. Pi-specific conventions baked into every launch: - Per-agent workspace dir created on the host and preopened into the guest at /workspace. - OPENAI_API_BASE env var defaulted to http://127.0.0.1:8935/v1 so pi-compiled guests hit the xpressclaw sidecar (task 6 makes this endpoint real). - XCLAW_SOCKET env var defaulted to /run/xclaw.sock so pi's shell verbs know where to find the bridge (task 5 wires it). - Caller-provided env/volumes win on conflict — PiHarness only fills in defaults. HarnessImageResolver is scaffolded with a file-path-only resolve(). OCI refs (ghcr.io/xpressai/harnesses/pi:tag) return a clear error that names task 4b. That task is in the tracker; resolver interface is stable so the eventual fill-in doesn't churn call sites. Two unit tests cover the pi launch path and the OCI-stub error. A new `xpressclaw pi-smoke` CLI subcommand runs the whole layer end-to-end with a noop WASM guest — creates the workspace, seeds env, launches, waits for exit, cleans up. Companion to c2w-smoke; both get removed when task 5/6 deliver the real pi flow.

Implements the xclaw shell bridge that lets non-MCP harnesses (pi and friends) talk to xpressclaw as if they were using shell commands instead of MCP tools (ADR-023 §7). Wire format (xpressclaw_core::xclaw): - Newline-delimited JSON over a Unix socket. One connection = one request, one response, close. Debuggable with socat/nc. - Verbs are dot-separated (memory.add, memory.list, version); args are a flat JSON object; agent_id rides along for per-agent attribution. Server listener (xpressclaw_server::xclaw_bridge): - Bound at <data_dir>/run/xclaw.sock during server startup. - Current verbs: version, memory.add, memory.list. More verbs (task.*, budget, log, ask) follow the same shape and drop in without protocol changes. - Unix-only; on Windows the start() fn logs a warning and no-ops (WASM guests don't care about host OS, only the host socket does). Client (xpressclaw-cli produces a second binary `xclaw`): - argv → verb+args parser: `xclaw memory add --content "hi" --tags a,b` becomes {verb: "memory.add", args: {content, tags}}. - Reads XCLAW_SOCKET + XCLAW_AGENT_ID env vars (set by PiHarness from task 4's constants). - Exit codes: 0=success, 1=server-side verb failed, 2=transport, 3=usage. Scripts in guests can branch on these. Tests: protocol roundtrip in core, transport roundtrip in server (real UnixStream + UnixListener), argv parser in the client. No end-to-end smoke CLI yet — manually runnable today by starting `xpressclaw up` and invoking `XCLAW_SOCKET=<data>/run/xclaw.sock xclaw version` from another terminal; add an automated smoke once PiHarness (task 4) mounts the socket into a live guest. Verb coverage is deliberately narrow for this commit. Remaining verbs in ADR-023 §7 (task.create/update/status/list, budget, log, ask) follow the same dispatch pattern and land in follow-up commits as each verb's backing API is needed.

…023 task 6) Extends the existing OpenAI-compatible endpoint at /v1/chat/completions into the single LLM entry point every harness talks to per ADR-023 §6. The endpoint already had agent_id extraction and degraded-model override; this commit fills the remaining gaps: - Hard-stop enforcement. If the agent is paused or `on_exceeded: stop` over limit, the request is refused with HTTP 429. `alert` mode is logged but lets the request through. - Streaming support. `{"stream": true}` routes through chat_stream() and returns OpenAI-style SSE with chunks passed through unmodified and a terminal `data: [DONE]`. - Token usage recording. On non-streaming completion, pulls the `usage` field off the provider response and writes a `usage_logs` row via `CostTracker`, then updates `BudgetManager` spend. On streaming, tokens are counted approximately (chars/4 for output, 0 for prompt) since `ChatCompletionChunk` doesn't carry usage; proper streaming accounting is follow-up task 11. Three new unit tests use a canned in-crate provider: - records_usage_for_agent: row lands in usage_logs after a request - works_without_auth: still 200, no usage recorded - honors_degraded_model_override: seeded degraded_model in budget_state makes the provider see `"local"` instead of the caller's `"canned-model"` — the ADR-023 §6 "transparent downgrade" promise.

Completes ADR-023 decision 4. Docker is no longer a dependency; agent workloads run on wasmtime + c2w exclusively. ## What was deleted - `crates/xpressclaw-core/src/docker/` module (manager, images, bollard plumbing) — 984 lines gone. - `crates/xpressclaw-core/src/runtime.rs` — dead-code orchestrator (no external callers). - `bollard` workspace + core dependency. - `impl Harness for DockerManager` scaffolding from harness/mod.rs. - `AppState::docker()` accessor and the docker field on AppState. - Docker check in `xpressclaw init` and `xpressclaw up`. - Graceful-shutdown Docker pass in server.rs (replaced with `harness.stop_all()`). - `impl From<bollard::errors::Error> for Error`. ## What stayed but got re-wired - `ContainerSpec` / `ContainerInfo` / `VolumeMount` types moved from `docker/manager.rs` to `harness/types.rs`. They describe the generic launch contract and have no Docker specifics. - `AppState::harness()` now returns the stored `Arc<dyn Harness>` directly instead of wrapping DockerManager. - Server routes (agents, conversations) use `state.harness()` for live status / port lookup / stop / logs. ## What got stubbed pending follow-up - `agents::reconciler` retains only its Ollama-model reconciliation. Agent container launching / orphan-task requeuing is paused until task 10 (GHCR pull) lands the real launch path. - `tasks::dispatcher::load_task` early-returns Requeue — the remainder of the state machine needs an Arc<dyn Harness> handle which lands with task 10. - `routes::apps` — app-container endpoints (launch / logs / proxy) return 503. Agent-app containers need their own ADR post-spike; out of scope here. - `routes::setup::check_docker` and `start_docker` kept as compatibility stubs that report `removed: true`. Task 12 rips the frontend setup-wizard step out. ## ADR / docs - ADR-003 marked "Superseded by ADR-023". - CLAUDE.md updated: container runtime is now "wasmtime + container2wasm", not Docker. ## Scope boundaries (deliberately unchanged) - Agent session / message-processor paths: already trait-object via `AppState::harness()` from task 1's migration; they gracefully handle `None` so the spike branch still compiles without real agents running. - LLM sidecar (/v1/chat/completions): untouched — task 6 work stands. - xclaw bridge: untouched — task 5 work stands. ## Diff summary -2860 / +203 = net -2657 lines. All 327 core + 53 server library tests pass. Workspace builds clean (clippy: no errors; a few warnings about now-unused args on the stubbed app/dispatcher paths — intentionally left with `_` prefixes so they're easy to re-wire in task 10).

Adds the rollback-on-failure plumbing that backs MVP criterion 7 of ADR-023 ("rogue `rm -rf /` → automatic rollback, host unaffected"). ## Trait surface `Harness` gains three new methods, all with sensible defaults: - `snapshot(agent_id) -> SnapshotId` — capture the guest's persistent state so a future `restore` can roll it back. - `restore(agent_id, &SnapshotId)` — revert persistent state to a prior snapshot. - `delete_snapshot(&SnapshotId)` — free the snapshot's backing storage. The default implementations return "not supported" for `snapshot` and `restore` and no-op for `delete_snapshot`. Harnesses that can persist guest state override as needed. ## C2wHarness implementation Tracks the preopen list per running agent; `snapshot` copies each preopened directory to `<cache_dir>/snapshots/<uuid>/<index>/`. `restore` stops-and-replaces the original directories from the snapshot copy. `delete_snapshot` rm -rfs the backing dir. Scope honestly reflected in the code: - Snapshot covers filesystem (preopens), not in-flight WASM memory or tmux session pty. That matches the ADR's "drop the `Store` and restart" model — the guest is expected to be re-instantiated after restore. - `restore` reverts filesystem but doesn't re-launch the guest — the task-dispatcher caller (task 10) decides whether to stop/start. - Symlinks inside a preopen aren't followed during copy; snapshots contain regular files + directory structure. ## Test + CLI smoke New unit test `snapshot_and_restore_roundtrip_workspace` seeds a file in a preopened dir, snapshots, mutates, restores, and asserts the mutations are reverted. New `xpressclaw rollback-smoke` subcommand runs the same flow from the CLI — launches a c2w guest, simulates a rogue tool call rewriting the workspace, restores the snapshot, and verifies the filesystem is reverted. Prints a step-by-step narration that ends with `Smoke test passed.` on success. This is the seventh MVP exit criterion made runnable. Wiring into the real task dispatcher (pre-step snapshot, on-failure restore) lands when task 10 gives the dispatcher an `Arc<dyn Harness>` handle.

…k 9) Surfaces two ADR-023 features in the conversation page: the transparent budget downgrade from task 6 becomes visible, and the tmux-attach entry point from a future pi harness has its UI slot reserved. ## Backend - `Harness::attach_tmux(agent_id)` added with default `None`. Concrete harnesses (pi, future shell-native backends) override it to return their tmux session descriptor. - New `TmuxAttach { session_name, socket_path }` type. - `GET /api/agents/:id/tmux` returns `{ available, session? }`. - `GET /api/budget/:id` now includes `degraded_model` and `is_paused` alongside the existing `BudgetSummary` fields so the UI can render the downgrade chip without a second call. ## Frontend - New API types `AgentBudgetState` and `AgentTmuxStatus` in `$lib/api`. - `budget.agent(id)` + `agentHarness.tmux(id)` helpers. - Conversation page (`routes/conversations/[id]/+page.svelte`): - Fetches budget + tmux state for `primaryAgent` reactively on navigation; refreshes when the primary agent changes. - **Downgrade chip** (amber border, `🪫`-style icon, "running on <model> (budget)") renders in the agent-status row when the sidecar has swapped in a local fallback. Hidden otherwise. - **Tmux attach button** renders in the header's icon cluster only when the harness advertises `available: true`. Currently hidden (no harness exposes tmux yet); wired to a stub click handler pending xterm.js integration alongside the first real tmux-exposing harness. ## What's deliberately not in this commit - xterm.js integration + the WebSocket terminal stream. That lands with the first tmux-exposing harness (pi, via task 10's real agent flow) — doing it now would be speculative plumbing against a missing backend. - The `attach_tmux` override for `PiHarness`. Pi images running under c2w don't have a host-visible tmux socket until task 10 wires the socket preopen through; adding the override now without the socket path would be fiction. ## Tests - `svelte-check`: 0 errors, 115 warnings (all pre-existing). - `cargo test -p xpressclaw-server --features metal --lib`: 53 pass (no regressions). - clippy + rustfmt clean. Covers MVP criterion 4 UX surface + criterion 6 UX surface from ADR-023. The signals are wired end-to-end: a user watching a conversation sees the downgrade the instant the sidecar triggers it.

…ADR-023 task 10 phase 1) Makes the desktop app runnable end-to-end on a machine with zero pi harness image available, so the spike's whole stack can be smoke-tested in the real UI. Real GHCR OCI pull is task 10 phase 2 (owed once a pi WASM is published); until then, agents launch against a bundled noop harness that ships in the binary. ## Changes **`HarnessImageResolver::with_fallback`** — new constructor. When the image ref doesn't resolve to a local file, writes the bundled noop WASM into the cache dir and returns that path. Bundled WAT is compiled to WASM on first use via `wat::parse_str` (moved from dev-dep to regular dep). Old `::new()` constructor keeps the strict "file-only, else error" behavior for tests + the `pi-smoke` CLI. **`AgentConfig::image: Option<String>`** — new optional field on the config struct. Users can set a local `.wasm` path for development or an OCI ref for production (once OCI lands). Missing / None → bundled fallback. Serde default keeps existing configs compatible. **`AppState::set_harness`** — setter for installing the harness at runtime. Called once from `server::serve()` with `PiHarness { C2wHarness { C2wRuntime }, HarnessImageResolver::with_fallback, <data>/workspaces }`. Harness directory tree is `<data>/harness-cache/` + `<data>/workspaces/`; both are created on startup. **`agents::reconciler::start`** restored with a real `reconcile_agents` that calls `harness.launch()` for agents with `desired_status=running` and `harness.stop()` for stopped agents. Errors land in the agent record's `AgentStatus::Error` — visible in the UI. Ollama model reconciliation (unchanged since task 7) still runs alongside. **Reconciler signature gains `harness: Option<Arc<dyn Harness>>`** — passed in from `server::serve()` via `state.harness().await` so the reconciler and routes share one harness. When wasmtime init fails, the harness arg is `None` and the agent loop logs once and skips; the server stays up. ## What a user can now do in the desktop app 1. `xpressclaw init && xpressclaw up` (or launch the Tauri bundle). 2. Go through setup. 3. Create an agent — leave image blank or set to any path, will fall back to bundled noop harness. 4. Agent enters `starting` → `running` via the reconciler within ≤10s. 5. Send a chat message. The conversation page falls through to the LLM router (no endpoint port on the noop guest), so if a provider is configured (OpenAI/Anthropic/local/ollama) the response streams from there. 6. Budget tracking, transparent downgrade, xclaw verbs, memory, tasks — all work since they're server-side. ## Known limitations deliberately unfixed - Agents don't self-respond; the noop guest has no HarnessClient endpoint. Fills in with a real pi image via task 10 phase 2. - Task dispatcher still early-returns Requeue because it doesn't have `Arc<dyn Harness>` in scope; threading that through the dispatcher's state machine lands with the same commit that makes pi actually respond (task 10 phase 2). - Agent edit/create UI in the frontend doesn't expose the `image` field — users configuring non-default images edit the YAML. UI surface is task-12-phase-C polish. ## Tests - 4 harness unit tests pass, including the new `resolver_with_fallback_materializes_bundled_wasm`. - 327 core + 53 server library tests pass (no regressions). - clippy + rustfmt clean.

Adds a harness that can actually *respond* when the user sends a chat message, so the desktop app demonstrates the agent → harness → LLM → response loop honestly. Installed at server startup in place of the previous bundled-WASM-noop path. ## What it does `EchoHarness` implements the `Harness` trait. On `launch(agent_id)`: 1. Binds `127.0.0.1:0` — the OS picks an unused port per agent. 2. Spawns a Tokio task serving axum's `/v1/chat/completions` on that port (both streaming and non-streaming). 3. `endpoint_port(agent_id)` returns the bound port, so the conversations processor connects the real HarnessClient instead of falling back to the LlmRouter. 4. Each request prepends a pinned system-prompt banner identifying the harness and forwards through `LlmRouter::chat_stream` — responses come from whatever provider is configured (cloud or local) but visibly route through the harness first. `stop(agent_id)` aborts the per-agent task; `stop_all` iterates; list / is_running / uptime_secs report live state. ## Why in-process and not WASM A WASM guest listening on a host-reachable TCP port requires one of: - wasmtime-wasi preview 2 + `wasi:sockets` wired in `C2wInstance`; forces every c2w guest (and the future real pi image) to be a preview-2 module. - wasmtime-wasi preview 1 + a host shim that backs a preopen FD with a real socket; works with today's c2w but is custom plumbing. Both are real work and only pay off once a real pi-as-c2w image exists on GHCR (task 10 phase 2). Until then EchoHarness lives behind the same `Harness` trait + `AppState::harness()` surface, so the swap to a WASM harness is a one-line change in `server::serve()` the moment the images are ready. ## What users see now 1. Launch the desktop app. 2. Configure an agent. 3. Reconciler picks up `desired_status=running` and calls `EchoHarness::launch`; the agent gets a real host port and status goes to `running`. 4. Send a chat message. The conversation flow reaches the agent's per-agent HTTP server, which forwards through the LLM router. 5. The agent's reply starts with the harness banner — proof that the response flowed through the harness, not directly. 6. Budget tracking, transparent downgrade, xclaw bridge, memory, tasks: all continue working. ## Tests - 3 new unit tests (banner prepend with + without existing system message, lifecycle roundtrip). - 56 server library tests total (53 prior + 3 new). - 327 core tests still pass. clippy + rustfmt clean. ## Known limitations - EchoHarness is in-process and has full host access — *not* the sandboxed story ADR-023 promises. Clearly marked in the module doc + commit as a demo path; replaced by the c2w+PiHarness path once real images exist. - logs() returns empty (harness logs via `tracing` into the server's log stream; no separate capture). - Snapshot/restore aren't meaningful for an in-process harness; falls back to trait default (unsupported).

… task 10 phase 2) Closes the "fake fallback only" limitation of the image resolver so you can test the WASM-sandboxed harness against a real OCI registry — GHCR in production, local podman during dev — before committing to any merge. ## OCI pull `HarnessImageResolver::resolve` now dispatches on ref shape: - Filesystem path → use directly (dev path). - `host[:port]/path[:tag]` → OCI artifact pull via `oci-client`. - Else, with fallback enabled → materialize the bundled noop WASM. - Else → error explaining what was expected. The OCI path pulls the manifest, fetches the first layer's blob, and caches it on disk keyed by manifest digest so retags/repulls are free. Plain HTTP is auto-enabled when the registry is localhost / 127.0.0.1 / ::1, so `podman run -p 5000:5000 registry:2` works out of the box without TLS ceremony. Auth: reads `XPRESSCLAW_REGISTRY_TOKEN` from the environment and sends it as `Bearer`; anonymous otherwise. `gh auth token` piped into this env var is the simplest GHCR path once real images ship there. New `is_local_registry` + `looks_like_oci_ref` helpers isolate the heuristics, each with unit coverage. ## Harness backend switch `server::serve()` reads `XPRESSCLAW_HARNESS`: - `echo` (default) — the in-process EchoHarness from the prior commit. No external dependencies, works out of the box. - `pi` — installs `PiHarness` on `C2wRuntime`. Use with `XPRESSCLAW_HARNESS=pi xpressclaw up` when a WASM harness image is available. Falls back to echo if wasmtime init fails. ## Local podman test recipe ``` # Run a registry locally podman run -d -p 5000:5000 --name registry registry:2 # Push a WASM blob (use the bundled noop for now, or your own pi build) oras push localhost:5000/pi:dev pi.wasm # Configure an agent with image: localhost:5000/pi:dev in xpressclaw.yaml # Start xpressclaw with the pi harness XPRESSCLAW_HARNESS=pi xpressclaw up ``` `PiHarness::launch` → resolver pulls from the local registry, caches under `<data>/harness-cache/sha256-<digest>.wasm`, `C2wHarness` instantiates it on wasmtime. Agent status goes to `running`. ## Known limitations Today the only WASM you can realistically push is something that exits immediately (like the bundled noop). A real "serves HTTP on a host-reachable port" harness still needs wasi-sockets wiring in `C2wInstance` (wasmtime-wasi preview 2 switch) — that's separate work. This commit proves the image-delivery pipeline works; the content-that-actually-responds pipeline is the next piece. ## Tests - 7 harness unit tests pass (3 new: `oci_ref_heuristic`, `local_registry_detection`, plus the two renamed resolver tests). - 56 server library tests pass. - 327 core tests pass. - clippy + fmt clean.

Extends build.sh with an opt-out push step matching the rest of the script's convention (like --skip-docker / --skip-test). Runs by default; pass --skip-push to skip. - `build.sh --pi-image=<ref>` or `XCLAW_PI_IMAGE=...` overrides the target. Default is `localhost:5000/pi:dev` so `podman run -d -p 5000:5000 --name xclaw-registry registry:2` is enough setup. - Skips gracefully with a one-line message when `oras` isn't installed or no registry responds at the ref's host — matches how --skip-docker behaves when `docker` is absent. Doesn't hard-fail. - Pushes as an OCI artifact with media type `application/vnd.xpressclaw.harness.wasm+v1` so future harness types (codex, opencode) can share a repo and differentiate by media type. Adds a `xpressclaw write-bundled-wasm <path>` CLI subcommand that materializes the bundled noop WASM to a given file, so the push step doesn't need `wat2wasm`/wabt on the host — it just asks the already-built CLI to dump its bundled wasm. Once a real pi image is being compiled via c2w, swap write-bundled-wasm for a real build command and this same push step handles it.

Before this, EchoHarness was a dumb proxy — it forwarded chat completion requests to the LLM router without injecting available MCP tools or handling tool_calls in the response. The model noticed the absence and started *narrating* tool calls in prose (`search_memory("user preferences")` as code-block text), which the user flagged: tools weren't actually being executed. Root cause: the agent loop lived in the Docker-era harness container (claude-agent-sdk did tool dispatch internally). EchoHarness replaced that with nothing. Fix: do the loop in EchoHarness. ## What the handler does now 1. Prepend the harness banner to the system prompt (unchanged). 2. If the caller didn't set `tools`, inject all MCP tool schemas from the shared `McpManager` so the LLM knows what's callable. 3. Call the LLM non-streaming. Up to `MAX_TOOL_TURNS` (20) times: - If the response has `tool_calls`, append the assistant message to history, execute each call via `McpManager::call_tool`, then append `tool`-role messages with the flattened text of each result. Loop. - If no `tool_calls`, this is the terminal turn. 4. For the terminal turn: if the caller asked for streaming, re-invoke the LLM in streaming mode with the full accumulated history and stream the output. Otherwise return the JSON as-is. Tool-using turns never surface as chat messages to the user — they're internal. The user sees the final answer with tools having been invoked behind the scenes. Matches the claude-agent-sdk behavior xpressclaw was originally designed around. ## Wire changes - `EchoHarness::new(router, mcp_manager)` — added second arg. - `EchoHandlerState` gains `mcp_manager: Arc<McpManager>`. - Server startup: pass `state.mcp_manager.clone()` when constructing. - New helper: `format_tool_result(&McpToolResult) -> String` flattens text/image/resource blocks for tool-role messages. ## Limits - `MAX_TOOL_TURNS = 20` — generous for normal multi-step tasks, tight enough to fail fast on runaway loops. Exceeding it returns HTTP 507 with a diagnostic. - Image results render as `[image: <mime>]` placeholder text; binary data is dropped. LLMs that want to *receive* images back need a vision-capable endpoint and per-message image support we haven't wired. Acceptable for text-first tools. - Streaming re-runs the final turn in streaming mode. Costs one extra (cheap) LLM call per conversation. Fair trade for token-by-token UI output. Tests: 3 existing `echo_harness` tests pass. Full behavioral coverage of the agent loop requires a mock LLM provider + MCP manager — that's larger plumbing than fits here; the code is exercised end-to-end whenever the desktop app issues a tool-using request.

wmeddie added 2 commits April 20, 2026 18:44

docs(adr): mark ADR-023 as Accepted

e84ea37

wmeddie force-pushed the spike/pi-harness branch from 6dd89b6 to e84ea37 Compare April 20, 2026 09:44

wmeddie added 2 commits April 20, 2026 19:19

wmeddie temporarily deployed to integration April 20, 2026 10:27 — with GitHub Actions Inactive

wmeddie temporarily deployed to integration April 20, 2026 10:48 — with GitHub Actions Inactive

wmeddie temporarily deployed to integration April 20, 2026 11:05 — with GitHub Actions Inactive

wmeddie temporarily deployed to integration April 20, 2026 11:20 — with GitHub Actions Inactive

wmeddie temporarily deployed to integration April 20, 2026 11:41 — with GitHub Actions Inactive

wmeddie temporarily deployed to integration April 20, 2026 11:59 — with GitHub Actions Inactive

wmeddie temporarily deployed to integration April 20, 2026 12:41 — with GitHub Actions Inactive

wmeddie temporarily deployed to integration April 20, 2026 12:59 — with GitHub Actions Inactive

wmeddie temporarily deployed to integration April 20, 2026 14:15 — with GitHub Actions Inactive

wmeddie temporarily deployed to integration April 20, 2026 14:31 — with GitHub Actions Inactive

wmeddie temporarily deployed to integration April 20, 2026 14:41 — with GitHub Actions Inactive

wmeddie had a problem deploying to integration April 20, 2026 15:13 — with GitHub Actions Failure

wmeddie temporarily deployed to integration April 20, 2026 15:20 — with GitHub Actions Inactive

wmeddie deployed to integration April 21, 2026 00:44 — with GitHub Actions Active

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ADR-023: pi-agent harness and wasmtime/c2w containers#100

ADR-023: pi-agent harness and wasmtime/c2w containers#100
wmeddie wants to merge 17 commits intomainfrom
spike/pi-harness

wmeddie commented Apr 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant