ADR-023: pi-agent harness and wasmtime/c2w containers#100
Draft
ADR-023: pi-agent harness and wasmtime/c2w containers#100
Conversation
Establishes the direction for the next spike attempt after spike/wanix-agents drifted out of scope. Xpressclaw reframes as an explicit meta-harness, Docker is removed in full (supersedes ADR-003), c2w on wasmtime becomes the sole container runtime embedded like llama-cpp, GHCR is the image distribution channel, and the sidecar becomes the single LLM endpoint for every harness so budget-aware transparent downgrade to local inference is an architectural property. V1 scope is two harnesses (pi + retrofitted built-in) and seven MVP exit criteria, so we have something to hold the next attempt accountable to.
6dd89b6 to
e84ea37
Compare
First slice of the ADR-023 meta-harness refactor. Defines the `Harness` trait in `xpressclaw-core::harness`, implements it for the existing `DockerManager` so the rest of the codebase gains a trait-object path without any behavior change, and exposes an `AppState::harness()` accessor alongside the existing `AppState::docker()`. The trait is deliberately narrow — only the lifecycle + endpoint + observability surface the reconciler, task dispatcher, and message processor actually consume. Image management, snapshotting, and tmux-attach ship in follow-up commits (tasks 2/3/8/9). No callers migrated yet. This is scaffolding; subsequent commits in this task swap `AppState::docker()` users over to `AppState::harness()` one module at a time.
Replaces two direct DockerManager::connect() sites with trait-object calls via state.harness() and endpoint_port(agent_id). Removes two per-request fresh Docker connections — the harness() accessor reuses the shared cached connection. This also proves the Harness trait in hot paths (message streaming + cancel) before landing the c2w implementation. Reconciler, task dispatcher, and apps routes stay on DockerManager for now; they migrate once C2wHarness exists and the switch has architectural payoff instead of being a mechanical rename.
Embeds wasmtime 27 + wasmtime-wasi as workspace deps and adds a new xpressclaw_core::c2w module with the low-level runtime primitive that task 3 (C2wHarness) will compose with. C2wRuntime wraps wasmtime Engine with: - async_support for tokio-driven guests - epoch_interruption enabled; a background Tokio task ticks the engine epoch counter every 50ms so guests can be deadline-aborted (this is the rollback primitive) - backtrace capture for diagnostics InstanceSpec + C2wInstance model the per-agent guest: env, preopened dirs (for the host-guest filesystem bridge), WASI args, optional epoch deadline. MVP exposes run_to_completion only — long-lived instances with an HTTP endpoint land in task 3. Target WASI version is preview 1 because c2w emits preview-1 modules today. If c2w adopts preview 2 we switch; the module layering is designed to isolate that choice. Tests: unit test confirms the runtime constructs and rejects bogus module bytes. End-to-end (running an actual c2w-compiled image) requires having an image built, which lands in task 3.
Adds `C2wHarness` — a `Harness` implementation backed by the `C2wRuntime` primitive from task 2 — and a `xpressclaw c2w-smoke` subcommand that exercises the full lifecycle on a real machine. C2wHarness scope: - Per-agent tokio tasks drive each guest to completion (or forever, for long-lived harnesses). - Lifecycle: launch/stop/stop_all/list/is_running/uptime_secs work. - `ContainerSpec.image` is interpreted as a filesystem path to a prebuilt WASM module — GHCR pulling + OCI-to-WASM conversion ships in task 4 without changing call sites. - Stop aborts the driver task (epoch deadline would be ≤ 50ms latency otherwise). Deferred to follow-up commits in this series: - Real stdout/stderr capture into the returned Harness::logs buffer. Right now guests inherit the process stdio; `logs()` returns empty strings. Requires threading MemoryOutputPipe through C2wInstance; minor refactor of the primitive's API so lands cleanly as 3b. - Host-side port exposure (endpoint_port returns None). Requires wasi-sockets plumbing that only becomes real with a HTTP-serving harness image — ships with task 4 alongside PiHarness. The `xpressclaw c2w-smoke` subcommand builds a noop WASI guest from embedded WAT and launches it through C2wHarness end-to-end. This is the first thing a developer can actually run to verify the runtime works on their machine. Subcommand is intentionally temporary — task 4 adds `xpressclaw harness add/run` that subsumes this with a real workload. Non-obvious bug fixed while writing this: wasmtime's epoch deadline is absolute, not relative, and the background tick driver advances the engine's counter from runtime construction. A store-default deadline of 0 therefore traps the first time the counter ticks past 0 — basically immediately. Fixed by setting a huge default deadline (u64::MAX/2 ≈ 14 billion years at 50ms/tick); caller-specified deadlines land as a follow-up when snapshot/rollback (task 8) needs precise per-step budgets. Tests: c2w unit test (runtime smoke) + harness unit test (launch-and-run noop wasm) both pass. Manual verification via the c2w-smoke CLI also passes end-to-end on macOS.
Adds the pi-agent harness layer on top of C2wHarness. Pi-specific conventions baked into every launch: - Per-agent workspace dir created on the host and preopened into the guest at /workspace. - OPENAI_API_BASE env var defaulted to http://127.0.0.1:8935/v1 so pi-compiled guests hit the xpressclaw sidecar (task 6 makes this endpoint real). - XCLAW_SOCKET env var defaulted to /run/xclaw.sock so pi's shell verbs know where to find the bridge (task 5 wires it). - Caller-provided env/volumes win on conflict — PiHarness only fills in defaults. HarnessImageResolver is scaffolded with a file-path-only resolve(). OCI refs (ghcr.io/xpressai/harnesses/pi:tag) return a clear error that names task 4b. That task is in the tracker; resolver interface is stable so the eventual fill-in doesn't churn call sites. Two unit tests cover the pi launch path and the OCI-stub error. A new `xpressclaw pi-smoke` CLI subcommand runs the whole layer end-to-end with a noop WASM guest — creates the workspace, seeds env, launches, waits for exit, cleans up. Companion to c2w-smoke; both get removed when task 5/6 deliver the real pi flow.
Implements the xclaw shell bridge that lets non-MCP harnesses (pi and
friends) talk to xpressclaw as if they were using shell commands
instead of MCP tools (ADR-023 §7).
Wire format (xpressclaw_core::xclaw):
- Newline-delimited JSON over a Unix socket. One connection = one
request, one response, close. Debuggable with socat/nc.
- Verbs are dot-separated (memory.add, memory.list, version); args
are a flat JSON object; agent_id rides along for per-agent
attribution.
Server listener (xpressclaw_server::xclaw_bridge):
- Bound at <data_dir>/run/xclaw.sock during server startup.
- Current verbs: version, memory.add, memory.list. More verbs
(task.*, budget, log, ask) follow the same shape and drop in
without protocol changes.
- Unix-only; on Windows the start() fn logs a warning and no-ops
(WASM guests don't care about host OS, only the host socket does).
Client (xpressclaw-cli produces a second binary `xclaw`):
- argv → verb+args parser: `xclaw memory add --content "hi" --tags a,b`
becomes {verb: "memory.add", args: {content, tags}}.
- Reads XCLAW_SOCKET + XCLAW_AGENT_ID env vars (set by PiHarness
from task 4's constants).
- Exit codes: 0=success, 1=server-side verb failed, 2=transport,
3=usage. Scripts in guests can branch on these.
Tests: protocol roundtrip in core, transport roundtrip in server
(real UnixStream + UnixListener), argv parser in the client. No
end-to-end smoke CLI yet — manually runnable today by starting
`xpressclaw up` and invoking `XCLAW_SOCKET=<data>/run/xclaw.sock
xclaw version` from another terminal; add an automated smoke once
PiHarness (task 4) mounts the socket into a live guest.
Verb coverage is deliberately narrow for this commit. Remaining
verbs in ADR-023 §7 (task.create/update/status/list, budget, log,
ask) follow the same dispatch pattern and land in follow-up commits
as each verb's backing API is needed.
…023 task 6)
Extends the existing OpenAI-compatible endpoint at /v1/chat/completions
into the single LLM entry point every harness talks to per ADR-023 §6.
The endpoint already had agent_id extraction and degraded-model
override; this commit fills the remaining gaps:
- Hard-stop enforcement. If the agent is paused or `on_exceeded: stop`
over limit, the request is refused with HTTP 429. `alert` mode is
logged but lets the request through.
- Streaming support. `{"stream": true}` routes through chat_stream()
and returns OpenAI-style SSE with chunks passed through unmodified
and a terminal `data: [DONE]`.
- Token usage recording. On non-streaming completion, pulls the
`usage` field off the provider response and writes a `usage_logs`
row via `CostTracker`, then updates `BudgetManager` spend. On
streaming, tokens are counted approximately (chars/4 for output,
0 for prompt) since `ChatCompletionChunk` doesn't carry usage;
proper streaming accounting is follow-up task 11.
Three new unit tests use a canned in-crate provider:
- records_usage_for_agent: row lands in usage_logs after a request
- works_without_auth: still 200, no usage recorded
- honors_degraded_model_override: seeded degraded_model in budget_state
makes the provider see `"local"` instead of the caller's
`"canned-model"` — the ADR-023 §6 "transparent downgrade" promise.
Completes ADR-023 decision 4. Docker is no longer a dependency; agent workloads run on wasmtime + c2w exclusively. ## What was deleted - `crates/xpressclaw-core/src/docker/` module (manager, images, bollard plumbing) — 984 lines gone. - `crates/xpressclaw-core/src/runtime.rs` — dead-code orchestrator (no external callers). - `bollard` workspace + core dependency. - `impl Harness for DockerManager` scaffolding from harness/mod.rs. - `AppState::docker()` accessor and the docker field on AppState. - Docker check in `xpressclaw init` and `xpressclaw up`. - Graceful-shutdown Docker pass in server.rs (replaced with `harness.stop_all()`). - `impl From<bollard::errors::Error> for Error`. ## What stayed but got re-wired - `ContainerSpec` / `ContainerInfo` / `VolumeMount` types moved from `docker/manager.rs` to `harness/types.rs`. They describe the generic launch contract and have no Docker specifics. - `AppState::harness()` now returns the stored `Arc<dyn Harness>` directly instead of wrapping DockerManager. - Server routes (agents, conversations) use `state.harness()` for live status / port lookup / stop / logs. ## What got stubbed pending follow-up - `agents::reconciler` retains only its Ollama-model reconciliation. Agent container launching / orphan-task requeuing is paused until task 10 (GHCR pull) lands the real launch path. - `tasks::dispatcher::load_task` early-returns Requeue — the remainder of the state machine needs an Arc<dyn Harness> handle which lands with task 10. - `routes::apps` — app-container endpoints (launch / logs / proxy) return 503. Agent-app containers need their own ADR post-spike; out of scope here. - `routes::setup::check_docker` and `start_docker` kept as compatibility stubs that report `removed: true`. Task 12 rips the frontend setup-wizard step out. ## ADR / docs - ADR-003 marked "Superseded by ADR-023". - CLAUDE.md updated: container runtime is now "wasmtime + container2wasm", not Docker. ## Scope boundaries (deliberately unchanged) - Agent session / message-processor paths: already trait-object via `AppState::harness()` from task 1's migration; they gracefully handle `None` so the spike branch still compiles without real agents running. - LLM sidecar (/v1/chat/completions): untouched — task 6 work stands. - xclaw bridge: untouched — task 5 work stands. ## Diff summary -2860 / +203 = net -2657 lines. All 327 core + 53 server library tests pass. Workspace builds clean (clippy: no errors; a few warnings about now-unused args on the stubbed app/dispatcher paths — intentionally left with `_` prefixes so they're easy to re-wire in task 10).
Adds the rollback-on-failure plumbing that backs MVP criterion 7 of
ADR-023 ("rogue `rm -rf /` → automatic rollback, host unaffected").
## Trait surface
`Harness` gains three new methods, all with sensible defaults:
- `snapshot(agent_id) -> SnapshotId` — capture the guest's persistent
state so a future `restore` can roll it back.
- `restore(agent_id, &SnapshotId)` — revert persistent state to a
prior snapshot.
- `delete_snapshot(&SnapshotId)` — free the snapshot's backing
storage.
The default implementations return "not supported" for `snapshot` and
`restore` and no-op for `delete_snapshot`. Harnesses that can persist
guest state override as needed.
## C2wHarness implementation
Tracks the preopen list per running agent; `snapshot` copies each
preopened directory to `<cache_dir>/snapshots/<uuid>/<index>/`.
`restore` stops-and-replaces the original directories from the
snapshot copy. `delete_snapshot` rm -rfs the backing dir.
Scope honestly reflected in the code:
- Snapshot covers filesystem (preopens), not in-flight WASM memory
or tmux session pty. That matches the ADR's "drop the `Store` and
restart" model — the guest is expected to be re-instantiated
after restore.
- `restore` reverts filesystem but doesn't re-launch the guest —
the task-dispatcher caller (task 10) decides whether to stop/start.
- Symlinks inside a preopen aren't followed during copy; snapshots
contain regular files + directory structure.
## Test + CLI smoke
New unit test `snapshot_and_restore_roundtrip_workspace` seeds a file
in a preopened dir, snapshots, mutates, restores, and asserts the
mutations are reverted.
New `xpressclaw rollback-smoke` subcommand runs the same flow from
the CLI — launches a c2w guest, simulates a rogue tool call
rewriting the workspace, restores the snapshot, and verifies the
filesystem is reverted. Prints a step-by-step narration that ends
with `Smoke test passed.` on success.
This is the seventh MVP exit criterion made runnable. Wiring into
the real task dispatcher (pre-step snapshot, on-failure restore)
lands when task 10 gives the dispatcher an `Arc<dyn Harness>` handle.
…k 9)
Surfaces two ADR-023 features in the conversation page: the
transparent budget downgrade from task 6 becomes visible, and the
tmux-attach entry point from a future pi harness has its UI slot
reserved.
## Backend
- `Harness::attach_tmux(agent_id)` added with default `None`. Concrete
harnesses (pi, future shell-native backends) override it to return
their tmux session descriptor.
- New `TmuxAttach { session_name, socket_path }` type.
- `GET /api/agents/:id/tmux` returns `{ available, session? }`.
- `GET /api/budget/:id` now includes `degraded_model` and `is_paused`
alongside the existing `BudgetSummary` fields so the UI can render
the downgrade chip without a second call.
## Frontend
- New API types `AgentBudgetState` and `AgentTmuxStatus` in `$lib/api`.
- `budget.agent(id)` + `agentHarness.tmux(id)` helpers.
- Conversation page (`routes/conversations/[id]/+page.svelte`):
- Fetches budget + tmux state for `primaryAgent` reactively on
navigation; refreshes when the primary agent changes.
- **Downgrade chip** (amber border, `🪫`-style icon, "running on
<model> (budget)") renders in the agent-status row when the
sidecar has swapped in a local fallback. Hidden otherwise.
- **Tmux attach button** renders in the header's icon cluster only
when the harness advertises `available: true`. Currently hidden
(no harness exposes tmux yet); wired to a stub click handler
pending xterm.js integration alongside the first real
tmux-exposing harness.
## What's deliberately not in this commit
- xterm.js integration + the WebSocket terminal stream. That lands
with the first tmux-exposing harness (pi, via task 10's real
agent flow) — doing it now would be speculative plumbing against
a missing backend.
- The `attach_tmux` override for `PiHarness`. Pi images running
under c2w don't have a host-visible tmux socket until task 10
wires the socket preopen through; adding the override now without
the socket path would be fiction.
## Tests
- `svelte-check`: 0 errors, 115 warnings (all pre-existing).
- `cargo test -p xpressclaw-server --features metal --lib`:
53 pass (no regressions).
- clippy + rustfmt clean.
Covers MVP criterion 4 UX surface + criterion 6 UX surface from
ADR-023. The signals are wired end-to-end: a user watching a
conversation sees the downgrade the instant the sidecar triggers it.
…ADR-023 task 10 phase 1)
Makes the desktop app runnable end-to-end on a machine with zero pi
harness image available, so the spike's whole stack can be smoke-tested
in the real UI. Real GHCR OCI pull is task 10 phase 2 (owed once a pi
WASM is published); until then, agents launch against a bundled noop
harness that ships in the binary.
## Changes
**`HarnessImageResolver::with_fallback`** — new constructor. When the
image ref doesn't resolve to a local file, writes the bundled noop
WASM into the cache dir and returns that path. Bundled WAT is compiled
to WASM on first use via `wat::parse_str` (moved from dev-dep to
regular dep). Old `::new()` constructor keeps the strict "file-only,
else error" behavior for tests + the `pi-smoke` CLI.
**`AgentConfig::image: Option<String>`** — new optional field on the
config struct. Users can set a local `.wasm` path for development or
an OCI ref for production (once OCI lands). Missing / None → bundled
fallback. Serde default keeps existing configs compatible.
**`AppState::set_harness`** — setter for installing the harness at
runtime. Called once from `server::serve()` with
`PiHarness { C2wHarness { C2wRuntime }, HarnessImageResolver::with_fallback, <data>/workspaces }`.
Harness directory tree is `<data>/harness-cache/` + `<data>/workspaces/`;
both are created on startup.
**`agents::reconciler::start`** restored with a real
`reconcile_agents` that calls `harness.launch()` for agents with
`desired_status=running` and `harness.stop()` for stopped agents.
Errors land in the agent record's `AgentStatus::Error` — visible in
the UI. Ollama model reconciliation (unchanged since task 7) still
runs alongside.
**Reconciler signature gains `harness: Option<Arc<dyn Harness>>`** —
passed in from `server::serve()` via `state.harness().await` so the
reconciler and routes share one harness. When wasmtime init fails, the
harness arg is `None` and the agent loop logs once and skips; the
server stays up.
## What a user can now do in the desktop app
1. `xpressclaw init && xpressclaw up` (or launch the Tauri bundle).
2. Go through setup.
3. Create an agent — leave image blank or set to any path, will fall
back to bundled noop harness.
4. Agent enters `starting` → `running` via the reconciler within ≤10s.
5. Send a chat message. The conversation page falls through to the
LLM router (no endpoint port on the noop guest), so if a provider
is configured (OpenAI/Anthropic/local/ollama) the response streams
from there.
6. Budget tracking, transparent downgrade, xclaw verbs, memory, tasks
— all work since they're server-side.
## Known limitations deliberately unfixed
- Agents don't self-respond; the noop guest has no HarnessClient
endpoint. Fills in with a real pi image via task 10 phase 2.
- Task dispatcher still early-returns Requeue because it doesn't have
`Arc<dyn Harness>` in scope; threading that through the dispatcher's
state machine lands with the same commit that makes pi actually
respond (task 10 phase 2).
- Agent edit/create UI in the frontend doesn't expose the `image`
field — users configuring non-default images edit the YAML. UI
surface is task-12-phase-C polish.
## Tests
- 4 harness unit tests pass, including the new
`resolver_with_fallback_materializes_bundled_wasm`.
- 327 core + 53 server library tests pass (no regressions).
- clippy + rustfmt clean.
Adds a harness that can actually *respond* when the user sends a chat message, so the desktop app demonstrates the agent → harness → LLM → response loop honestly. Installed at server startup in place of the previous bundled-WASM-noop path. ## What it does `EchoHarness` implements the `Harness` trait. On `launch(agent_id)`: 1. Binds `127.0.0.1:0` — the OS picks an unused port per agent. 2. Spawns a Tokio task serving axum's `/v1/chat/completions` on that port (both streaming and non-streaming). 3. `endpoint_port(agent_id)` returns the bound port, so the conversations processor connects the real HarnessClient instead of falling back to the LlmRouter. 4. Each request prepends a pinned system-prompt banner identifying the harness and forwards through `LlmRouter::chat_stream` — responses come from whatever provider is configured (cloud or local) but visibly route through the harness first. `stop(agent_id)` aborts the per-agent task; `stop_all` iterates; list / is_running / uptime_secs report live state. ## Why in-process and not WASM A WASM guest listening on a host-reachable TCP port requires one of: - wasmtime-wasi preview 2 + `wasi:sockets` wired in `C2wInstance`; forces every c2w guest (and the future real pi image) to be a preview-2 module. - wasmtime-wasi preview 1 + a host shim that backs a preopen FD with a real socket; works with today's c2w but is custom plumbing. Both are real work and only pay off once a real pi-as-c2w image exists on GHCR (task 10 phase 2). Until then EchoHarness lives behind the same `Harness` trait + `AppState::harness()` surface, so the swap to a WASM harness is a one-line change in `server::serve()` the moment the images are ready. ## What users see now 1. Launch the desktop app. 2. Configure an agent. 3. Reconciler picks up `desired_status=running` and calls `EchoHarness::launch`; the agent gets a real host port and status goes to `running`. 4. Send a chat message. The conversation flow reaches the agent's per-agent HTTP server, which forwards through the LLM router. 5. The agent's reply starts with the harness banner — proof that the response flowed through the harness, not directly. 6. Budget tracking, transparent downgrade, xclaw bridge, memory, tasks: all continue working. ## Tests - 3 new unit tests (banner prepend with + without existing system message, lifecycle roundtrip). - 56 server library tests total (53 prior + 3 new). - 327 core tests still pass. clippy + rustfmt clean. ## Known limitations - EchoHarness is in-process and has full host access — *not* the sandboxed story ADR-023 promises. Clearly marked in the module doc + commit as a demo path; replaced by the c2w+PiHarness path once real images exist. - logs() returns empty (harness logs via `tracing` into the server's log stream; no separate capture). - Snapshot/restore aren't meaningful for an in-process harness; falls back to trait default (unsupported).
… task 10 phase 2) Closes the "fake fallback only" limitation of the image resolver so you can test the WASM-sandboxed harness against a real OCI registry — GHCR in production, local podman during dev — before committing to any merge. ## OCI pull `HarnessImageResolver::resolve` now dispatches on ref shape: - Filesystem path → use directly (dev path). - `host[:port]/path[:tag]` → OCI artifact pull via `oci-client`. - Else, with fallback enabled → materialize the bundled noop WASM. - Else → error explaining what was expected. The OCI path pulls the manifest, fetches the first layer's blob, and caches it on disk keyed by manifest digest so retags/repulls are free. Plain HTTP is auto-enabled when the registry is localhost / 127.0.0.1 / ::1, so `podman run -p 5000:5000 registry:2` works out of the box without TLS ceremony. Auth: reads `XPRESSCLAW_REGISTRY_TOKEN` from the environment and sends it as `Bearer`; anonymous otherwise. `gh auth token` piped into this env var is the simplest GHCR path once real images ship there. New `is_local_registry` + `looks_like_oci_ref` helpers isolate the heuristics, each with unit coverage. ## Harness backend switch `server::serve()` reads `XPRESSCLAW_HARNESS`: - `echo` (default) — the in-process EchoHarness from the prior commit. No external dependencies, works out of the box. - `pi` — installs `PiHarness` on `C2wRuntime`. Use with `XPRESSCLAW_HARNESS=pi xpressclaw up` when a WASM harness image is available. Falls back to echo if wasmtime init fails. ## Local podman test recipe ``` # Run a registry locally podman run -d -p 5000:5000 --name registry registry:2 # Push a WASM blob (use the bundled noop for now, or your own pi build) oras push localhost:5000/pi:dev pi.wasm # Configure an agent with image: localhost:5000/pi:dev in xpressclaw.yaml # Start xpressclaw with the pi harness XPRESSCLAW_HARNESS=pi xpressclaw up ``` `PiHarness::launch` → resolver pulls from the local registry, caches under `<data>/harness-cache/sha256-<digest>.wasm`, `C2wHarness` instantiates it on wasmtime. Agent status goes to `running`. ## Known limitations Today the only WASM you can realistically push is something that exits immediately (like the bundled noop). A real "serves HTTP on a host-reachable port" harness still needs wasi-sockets wiring in `C2wInstance` (wasmtime-wasi preview 2 switch) — that's separate work. This commit proves the image-delivery pipeline works; the content-that-actually-responds pipeline is the next piece. ## Tests - 7 harness unit tests pass (3 new: `oci_ref_heuristic`, `local_registry_detection`, plus the two renamed resolver tests). - 56 server library tests pass. - 327 core tests pass. - clippy + fmt clean.
Extends build.sh with an opt-out push step matching the rest of the script's convention (like --skip-docker / --skip-test). Runs by default; pass --skip-push to skip. - `build.sh --pi-image=<ref>` or `XCLAW_PI_IMAGE=...` overrides the target. Default is `localhost:5000/pi:dev` so `podman run -d -p 5000:5000 --name xclaw-registry registry:2` is enough setup. - Skips gracefully with a one-line message when `oras` isn't installed or no registry responds at the ref's host — matches how --skip-docker behaves when `docker` is absent. Doesn't hard-fail. - Pushes as an OCI artifact with media type `application/vnd.xpressclaw.harness.wasm+v1` so future harness types (codex, opencode) can share a repo and differentiate by media type. Adds a `xpressclaw write-bundled-wasm <path>` CLI subcommand that materializes the bundled noop WASM to a given file, so the push step doesn't need `wat2wasm`/wabt on the host — it just asks the already-built CLI to dump its bundled wasm. Once a real pi image is being compiled via c2w, swap write-bundled-wasm for a real build command and this same push step handles it.
Before this, EchoHarness was a dumb proxy — it forwarded chat
completion requests to the LLM router without injecting available MCP
tools or handling tool_calls in the response. The model noticed the
absence and started *narrating* tool calls in prose
(`search_memory("user preferences")` as code-block text), which the
user flagged: tools weren't actually being executed.
Root cause: the agent loop lived in the Docker-era harness container
(claude-agent-sdk did tool dispatch internally). EchoHarness replaced
that with nothing. Fix: do the loop in EchoHarness.
## What the handler does now
1. Prepend the harness banner to the system prompt (unchanged).
2. If the caller didn't set `tools`, inject all MCP tool schemas from
the shared `McpManager` so the LLM knows what's callable.
3. Call the LLM non-streaming. Up to `MAX_TOOL_TURNS` (20) times:
- If the response has `tool_calls`, append the assistant message
to history, execute each call via `McpManager::call_tool`, then
append `tool`-role messages with the flattened text of each
result. Loop.
- If no `tool_calls`, this is the terminal turn.
4. For the terminal turn: if the caller asked for streaming,
re-invoke the LLM in streaming mode with the full accumulated
history and stream the output. Otherwise return the JSON as-is.
Tool-using turns never surface as chat messages to the user — they're
internal. The user sees the final answer with tools having been
invoked behind the scenes. Matches the claude-agent-sdk behavior
xpressclaw was originally designed around.
## Wire changes
- `EchoHarness::new(router, mcp_manager)` — added second arg.
- `EchoHandlerState` gains `mcp_manager: Arc<McpManager>`.
- Server startup: pass `state.mcp_manager.clone()` when constructing.
- New helper: `format_tool_result(&McpToolResult) -> String` flattens
text/image/resource blocks for tool-role messages.
## Limits
- `MAX_TOOL_TURNS = 20` — generous for normal multi-step tasks,
tight enough to fail fast on runaway loops. Exceeding it returns
HTTP 507 with a diagnostic.
- Image results render as `[image: <mime>]` placeholder text; binary
data is dropped. LLMs that want to *receive* images back need a
vision-capable endpoint and per-message image support we haven't
wired. Acceptable for text-first tools.
- Streaming re-runs the final turn in streaming mode. Costs one
extra (cheap) LLM call per conversation. Fair trade for
token-by-token UI output.
Tests: 3 existing `echo_harness` tests pass. Full behavioral coverage
of the agent loop requires a mock LLM provider + MCP manager — that's
larger plumbing than fits here; the code is exercised end-to-end
whenever the desktop app issues a tool-using request.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Draft PR for ADR-023, which sets the direction for a clean re-do of the container-isolation work after spike/wanix-agents (PR #98) drifted out of scope.
The ADR reframes Xpressclaw as an explicit meta-harness, locks in wasmtime + container2wasm as the sole container runtime (embedded like llama-cpp), removes Docker support entirely (supersedes ADR-003 in full), distributes harness images via GHCR, and makes the sidecar the single LLM endpoint for every harness so budget-aware transparent downgrade to local inference is architectural, not per-harness.
V1 ships only two harnesses —
PiHarness(new) and a retrofitted built-in Claude-SDK harness — with seven concrete MVP exit criteria so the re-do has an anti-drift lever.What this PR is not
Key decisions captured
ghcr.io/xpressai/harnesses/*)Review focus
Test plan