diff --git a/.gitignore b/.gitignore
index cf9381d..ccf99db 100644
--- a/.gitignore
+++ b/.gitignore
@@ -3,3 +3,4 @@ dist/
*.tsbuildinfo
.worktrees/
.superpowers/
+coverage/
diff --git a/.prettierignore b/.prettierignore
index 52af816..c45c1e5 100644
--- a/.prettierignore
+++ b/.prettierignore
@@ -2,3 +2,4 @@ dist/
node_modules/
pnpm-lock.yaml
charts/
+coverage/
diff --git a/README.md b/README.md
index bd60779..3bb6657 100644
--- a/README.md
+++ b/README.md
@@ -1,6 +1,6 @@
# @copilotkit/llmock [](https://github.com/CopilotKit/llmock/actions/workflows/test-unit.yml) [](https://github.com/CopilotKit/llmock/actions/workflows/test-drift.yml) [](https://www.npmjs.com/package/@copilotkit/llmock)
-Deterministic mock LLM server for testing. A real HTTP server on a real port — not an in-process interceptor — so every process in your stack (Playwright, Next.js, agent workers, microservices) can point at it via `OPENAI_BASE_URL` / `ANTHROPIC_BASE_URL` and get reproducible, instant responses. Streams SSE in real OpenAI, Claude, Gemini, Bedrock, Azure, Vertex AI, Ollama, and Cohere API formats, driven entirely by fixtures. Zero runtime dependencies.
+Mock infrastructure for AI application testing — LLM APIs, MCP tools, A2A agents, vector databases, search, and more. Real HTTP server on a real port, fixture-driven, zero runtime dependencies.
## Quick Start
@@ -23,72 +23,106 @@ const url = await mock.start();
await mock.stop();
```
-## When to Use This vs MSW
+## Usage Scenarios
-[MSW (Mock Service Worker)](https://mswjs.io/) is a popular API mocking library, but it solves a different problem.
+### In-process testing
-**The key difference is architecture.** llmock runs a real HTTP server on a port. MSW patches `http`/`https`/`fetch` modules inside a single Node.js process. MSW can only intercept requests from the process that calls `server.listen()` — child processes, separate services, and workers are unaffected.
+Use the programmatic API to start and stop the mock server in your test setup. Every test framework works — Vitest, Jest, Playwright, Mocha, anything.
-This matters for E2E tests where multiple processes make LLM API calls:
+```typescript
+import { LLMock } from "@copilotkit/llmock";
+
+const mock = new LLMock({ port: 5555 });
+mock.loadFixtureDir("./fixtures");
+const url = await mock.start();
+process.env.OPENAI_BASE_URL = `${url}/v1`;
+
+// ... run tests ...
+
+await mock.stop();
+```
+
+### Running locally
+Use the CLI with `--watch` to hot-reload fixtures as you edit them. Point your app at the mock and iterate without touching real APIs.
+
+```bash
+llmock -p 4010 -f ./fixtures --watch
```
-Playwright test runner (Node)
- └─ controls browser → Next.js app (separate process)
- └─ OPENAI_BASE_URL → llmock :5555
- ├─ Mastra agent workers
- ├─ LangGraph workers
- └─ CopilotKit runtime
+
+### CI pipelines
+
+Use the Docker image with `--strict` mode and record-and-replay for deterministic, zero-cost CI runs.
+
+```yaml
+# GitHub Actions example
+- name: Start aimock
+ run: |
+ docker run -d --name aimock \
+ -v ./fixtures:/fixtures \
+ -p 4010:4010 \
+ ghcr.io/copilotkit/aimock \
+ llmock --strict -f /fixtures
+
+- name: Run tests
+ env:
+ OPENAI_BASE_URL: http://localhost:4010/v1
+ run: pnpm test
+
+- name: Stop aimock
+ run: docker stop aimock
```
-MSW can't intercept any of those calls. llmock can — it's a real server on a real port.
+### Cross-language testing
-**Use llmock when:**
+The Docker image runs as a standalone HTTP server — any language that speaks HTTP can use it. Python, Go, Rust, Ruby, Java, anything.
-- Multiple processes need to hit the same mock (E2E tests, agent frameworks, microservices)
-- You want multi-provider SSE format out of the box (OpenAI, Claude, Gemini, Bedrock, Azure, Vertex AI, Ollama, Cohere)
-- You prefer defining fixtures as JSON files rather than code
-- You need a standalone CLI server
+```bash
+docker run -d -p 4010:4010 ghcr.io/copilotkit/aimock llmock -f /fixtures
-**Use MSW when:**
+# Python
+client = openai.OpenAI(base_url="http://localhost:4010/v1", api_key="mock")
-- All API calls originate from a single Node.js process (unit tests, SDK client tests)
-- You're mocking many different APIs, not just OpenAI
-- You want in-process interception without running a server
+# Go
+client := openai.NewClient(option.WithBaseURL("http://localhost:4010/v1"))
-| Capability | llmock | MSW |
-| ---------------------------- | --------------------- | ------------------------------------------------------------------------- |
-| Cross-process interception | **Yes** (real server) | **No** (in-process only) |
-| OpenAI Chat Completions SSE | **Built-in** | Manual — build `data: {json}\n\n` + `[DONE]` yourself |
-| OpenAI Responses API SSE | **Built-in** | Manual — MSW's `sse()` sends `data:` events, not OpenAI's `event:` format |
-| Claude Messages API SSE | **Built-in** | Manual — build `event:`/`data:` SSE yourself |
-| Gemini streaming | **Built-in** | Manual — build `data:` SSE yourself |
-| WebSocket APIs | **Built-in** | **No** |
-| Fixture file loading (JSON) | **Yes** | **No** — handlers are code-only |
-| Request journal / inspection | **Yes** | **No** — track requests manually |
-| Non-streaming responses | **Yes** | **Yes** |
-| Error injection (one-shot) | **Yes** | **Yes** (via `server.use()`) |
-| CLI for standalone use | **Yes** | **No** |
-| Zero dependencies | **Yes** | **No** (~300KB) |
+# Rust
+let client = Client::new().with_base_url("http://localhost:4010/v1");
+```
## Features
-- **[Multi-provider support](https://llmock.copilotkit.dev/compatible-providers.html)** — [OpenAI Chat Completions](https://llmock.copilotkit.dev/chat-completions.html), [OpenAI Responses](https://llmock.copilotkit.dev/responses-api.html), [Anthropic Claude](https://llmock.copilotkit.dev/claude-messages.html), [Google Gemini](https://llmock.copilotkit.dev/gemini.html), [AWS Bedrock](https://llmock.copilotkit.dev/aws-bedrock.html) (streaming + Converse), [Azure OpenAI](https://llmock.copilotkit.dev/azure-openai.html), [Vertex AI](https://llmock.copilotkit.dev/vertex-ai.html), [Ollama](https://llmock.copilotkit.dev/ollama.html), [Cohere](https://llmock.copilotkit.dev/cohere.html)
+- **[Record-and-replay](https://llmock.copilotkit.dev/record-replay.html)** — VCR-style proxy records real API responses as fixtures for deterministic replay
+- **[Multi-provider support](https://llmock.copilotkit.dev/compatible-providers.html)** — [OpenAI Chat Completions](https://llmock.copilotkit.dev/chat-completions.html), [Responses API](https://llmock.copilotkit.dev/responses-api.html), [Anthropic Claude](https://llmock.copilotkit.dev/claude-messages.html), [Google Gemini](https://llmock.copilotkit.dev/gemini.html), [AWS Bedrock](https://llmock.copilotkit.dev/aws-bedrock.html), [Azure OpenAI](https://llmock.copilotkit.dev/azure-openai.html), [Vertex AI](https://llmock.copilotkit.dev/vertex-ai.html), [Ollama](https://llmock.copilotkit.dev/ollama.html), [Cohere](https://llmock.copilotkit.dev/cohere.html)
+- **[MCPMock](https://llmock.copilotkit.dev/mcp-mock.html)** — Mock MCP server with tools, resources, prompts, and session management
+- **[A2AMock](https://llmock.copilotkit.dev/a2a-mock.html)** — Mock A2A protocol server with agent cards, message routing, and streaming
+- **[VectorMock](https://llmock.copilotkit.dev/vector-mock.html)** — Mock vector database with Pinecone, Qdrant, and ChromaDB endpoints
+- **[Services](https://llmock.copilotkit.dev/services.html)** — Built-in search (Tavily), rerank (Cohere), and moderation (OpenAI) mocks
+- **[Chaos testing](https://llmock.copilotkit.dev/chaos-testing.html)** — Probabilistic failure injection: 500 errors, malformed JSON, mid-stream disconnects
+- **[Prometheus metrics](https://llmock.copilotkit.dev/metrics.html)** — Request counts, latencies, and fixture match rates at `/metrics`
- **[Embeddings API](https://llmock.copilotkit.dev/embeddings.html)** — OpenAI-compatible embedding responses with configurable dimensions
- **[Structured output / JSON mode](https://llmock.copilotkit.dev/structured-output.html)** — `response_format`, `json_schema`, and function calling
- **[Sequential responses](https://llmock.copilotkit.dev/sequential-responses.html)** — Stateful multi-turn fixtures that return different responses on each call
- **[Streaming physics](https://llmock.copilotkit.dev/streaming-physics.html)** — Configurable `ttft`, `tps`, and `jitter` for realistic timing
- **[WebSocket APIs](https://llmock.copilotkit.dev/websocket.html)** — OpenAI Responses WS, Realtime API, and Gemini Live
- **[Error injection](https://llmock.copilotkit.dev/error-injection.html)** — One-shot errors, rate limiting, and provider-specific error formats
-- **[Chaos testing](https://llmock.copilotkit.dev/chaos-testing.html)** — Probabilistic failure injection: 500 errors, malformed JSON, mid-stream disconnects
-- **[Prometheus metrics](https://llmock.copilotkit.dev/metrics.html)** — Request counts, latencies, and fixture match rates at `/metrics`
- **[Request journal](https://llmock.copilotkit.dev/docs.html)** — Record, inspect, and assert on every request
- **[Fixture validation](https://llmock.copilotkit.dev/fixtures.html)** — Schema validation at load time with `--validate-on-load`
- **CLI with hot-reload** — Standalone server with `--watch` for live fixture editing
- **[Docker + Helm](https://llmock.copilotkit.dev/docker.html)** — Container image and Helm chart for CI/CD pipelines
-- **Record-and-replay** — VCR-style proxy-on-miss records real API responses as fixtures for deterministic replay
- **[Drift detection](https://llmock.copilotkit.dev/drift-detection.html)** — Daily CI runs against real APIs to catch response format changes
- **Claude Code integration** — `/write-fixtures` skill teaches your AI assistant how to write fixtures correctly
+## aimock CLI (Full-Stack Mock)
+
+For projects that need more than LLM mocking, the `aimock` CLI reads a JSON config file and serves all mock services on one port:
+
+```bash
+aimock --config aimock.json --port 4010
+```
+
+See the [aimock documentation](https://llmock.copilotkit.dev/aimock-cli.html) for config file format and Docker usage.
+
## CLI Quick Reference
```bash
@@ -97,6 +131,7 @@ llmock [options]
| Option | Short | Default | Description |
| -------------------- | ----- | ------------ | ------------------------------------------- |
+| `--config` | | | Config file for aimock CLI |
| `--port` | `-p` | `4010` | Port to listen on |
| `--host` | `-h` | `127.0.0.1` | Host to bind to |
| `--fixtures` | `-f` | `./fixtures` | Path to fixtures directory or file |
@@ -137,6 +172,19 @@ Full API reference, fixture format, E2E patterns, and provider-specific guides:
**[https://llmock.copilotkit.dev/docs.html](https://llmock.copilotkit.dev/docs.html)**
+## llmock vs MSW
+
+[MSW (Mock Service Worker)](https://mswjs.io/) patches `http`/`https`/`fetch` inside a single Node.js process. llmock runs a real HTTP server on a real port that any process can reach — child processes, microservices, agent workers, Docker containers. MSW can't intercept any of those; llmock can. For a detailed comparison including other tools, see the [full comparison on the docs site](https://llmock.copilotkit.dev/#comparison).
+
+| Capability | llmock | MSW |
+| -------------------------- | ---------------------------- | ---------------------- |
+| Cross-process interception | **Yes** (real server) | No (in-process only) |
+| LLM SSE streaming | **Built-in** (13+ providers) | Manual for each format |
+| Fixture files (JSON) | **Yes** | No (code-only) |
+| Record & replay | **Yes** | No |
+| WebSocket APIs | **Yes** | No |
+| Zero dependencies | **Yes** | No (~300KB) |
+
## Real-World Usage
[CopilotKit](https://github.com/CopilotKit/CopilotKit) uses llmock across its test suite to verify AI agent behavior across multiple LLM providers without hitting real APIs.
diff --git a/docs/a2a-mock.html b/docs/a2a-mock.html
new file mode 100644
index 0000000..9346e3b
--- /dev/null
+++ b/docs/a2a-mock.html
@@ -0,0 +1,279 @@
+
+
+
+
+
+ A2AMock — llmock
+
+
+
+
+
+
+
+
+
+
+
+
+
A2AMock
+
+ Mock A2A (Agent-to-Agent) protocol server for testing multi-agent systems. Implements the
+ A2A JSON-RPC protocol with agent card discovery, message routing, task management, and SSE
+ streaming.
+
+ The agent card is served at GET /.well-known/agent-card.json and includes all
+ registered agents' skills and capabilities. The A2A-Version: 1.0 header is
+ included on all responses.
+
+
+
Inspection
+
+
+ Inspection API typescript
+
+
a2a.health(); // { status: "ok", agents: 2, tasks: 5 }
+a2a.reset(); // Clears all agents and tasks
+ aimock is the full-stack mock orchestrator. Where llmock serves
+ LLM endpoints only, aimock reads a JSON config file and serves LLM mocks
+ alongside additional mock services (MCP, A2A, vector stores) on a single port.
+
+
+
aimock vs llmock
+
+
+
+
Capability
+
llmock CLI
+
aimock CLI
+
+
+
+
+
LLM mock endpoints
+
Yes
+
Yes
+
+
+
Additional mock services
+
No
+
Yes (via mount)
+
+
+
Config file
+
CLI flags only
+
JSON config file
+
+
+
Single-port routing
+
LLM paths only
+
All services on one port
+
+
+
+
+
Quick Start
+
+
+
Run aimock bash
+
aimock --config aimock.json --port 4010
+
+
+
Config File Format
+
+ The config file is a JSON object describing which services to run and how to configure
+ them. The llm section configures the core LLMock server. Additional services
+ are mounted at path prefixes.
+
- Real HTTP server. Real SSE streams. WebSocket APIs. Fixture-driven responses.
- Multi-provider mock — OpenAI, Claude, Gemini — any process on the machine can reach it.
+ Mock infrastructure for testing AI applications — LLM APIs, MCP tools, A2A agents, vector
+ databases, search, and more. Real HTTP server on a real port. Fixture-driven. Zero
+ dependencies. Any process on the machine can reach it.
@@ -1142,104 +1144,148 @@
Deterministic mock LLM server for testing
+
+
+
+ How you'll use it
+
From unit tests to production CI
+
Four ways to run llmock, depending on what you need.
+
+
+
+
⚡
+
Unit Tests
+
+ In-process programmatic API. Start and stop in your test setup — Vitest, Jest,
+ Playwright, Mocha, anything. TypeScript/JavaScript.
+
+
+
+
🔄
+
Local Development
+
+ CLI with --watch for hot-reload. Edit fixtures, see changes instantly.
+ Point your app at the mock and iterate without real API calls.
+
+
+
+
🏗️
+
CI/CD
+
+ Docker image + --strict mode + record-and-replay. Deterministic,
+ zero-cost, no API keys needed in CI.
+
+
+
+
🌐
+
Cross-Language
+
+ Docker image as a standalone HTTP server. Python, Go, Rust, Ruby, Java — any
+ language that speaks HTTP can hit it.
+
+
+
+
+
+
Why llmock
Stop paying for flaky tests
- Tests that hit real LLM APIs — OpenAI, Gemini, Anthropic — cost money, time out, and
- produce non-deterministic results. llmock replaces those calls with immediate,
- deterministic responses from a real HTTP server any process on the machine can reach.
+ Tests that hit real LLM APIs cost money, time out, and produce non-deterministic results.
+ llmock replaces those calls with immediate, deterministic responses from a real HTTP
+ server any process on the machine can reach.
-
⚡
-
Real HTTP Server
+
🔴
+
Record & Replay
- Runs on an actual port. Any process on the machine can reach it — Next.js, Mastra,
- LangGraph, Agno, anything that speaks HTTP.
+ Proxy to real APIs, record responses as fixtures, then replay them deterministically
+ in tests. VCR-style workflow for zero-effort fixture creation.
📡
-
Authentic SSE Streams
+
13+ LLM Providers
- OpenAI, Claude, and Gemini APIs — authentic SSE format for each provider. Streaming
- and non-streaming modes.
+ OpenAI, Claude, Gemini, Bedrock, Azure, Vertex AI, Ollama, Cohere — authentic
+ SSE format for each provider. Streaming and non-streaming.
-
📁
-
JSON Fixture Files
+
🔌
+
MCP Protocol
- Define responses as JSON — one file per feature. Load a directory, load a file, or
- register fixtures programmatically.
+ Mock MCP servers with tools, resources, prompts, and session management. Test tool
+ integrations without real MCP infrastructure.
-
🔧
-
Tool Call Support
+
🤝
+
A2A Protocol
- Return tool calls with structured arguments. Match on tool names, tool result IDs, or
- write custom predicates.
+ Mock A2A agent-to-agent protocol. Agent cards, message routing, streaming tasks
+ — test multi-agent systems in isolation.
-
💥
-
Error Injection
+
📦
+
Vector Databases
- Queue one-shot errors — 429 rate limits, 503 outages, whatever. Fires once, then
- auto-removes itself.
+ Mock Pinecone, Qdrant, and ChromaDB endpoints. Test RAG pipelines without running real
+ vector databases.
-
📋
-
Request Journal
+
🎲
+
Chaos Testing
- Every request recorded. Inspect messages, verify tool calls, assert on conversation
- history. HTTP and programmatic access.
+ Probabilistic failure injection — random errors, latency spikes, and stream
+ corruption for resilience testing.
-
🔌
-
WebSocket APIs
+
📊
+
Prometheus Metrics
- OpenAI Responses, OpenAI Realtime, and Gemini Live over WebSocket. Same fixtures, real
- RFC 6455 framing, zero dependencies. Text + tool calls.
+ Expose request counts, latencies, and fixture match rates via a /metrics endpoint.
+ Grafana-ready.
-
🎛️
-
Streaming Physics
+
🔧
+
Tool Call Support
- Simulate realistic streaming timing with TTFT, TPS, and jitter. Test loading states
- and streaming UX under real-world conditions.
+ Return tool calls with structured arguments. Match on tool names, tool result IDs, or
+ write custom predicates.
-
🎲
-
Chaos Testing
+
🔌
+
WebSocket APIs
- Probabilistic failure injection — random errors, latency spikes, and stream
- corruption for resilience testing.
+ OpenAI Responses, OpenAI Realtime, and Gemini Live over WebSocket. Same fixtures, real
+ RFC 6455 framing, zero dependencies.
-
📊
-
Prometheus Metrics
+
🎛️
+
Streaming Physics
- Expose request counts, latencies, and fixture match rates via a /metrics endpoint.
- Grafana-ready.
+ Simulate realistic streaming timing with TTFT, TPS, and jitter. Test loading states
+ and streaming UX under real-world conditions.
-
🔴
-
Record & Replay
+
📋
+
Request Journal
- Proxy to real APIs, record responses as fixtures, then replay them deterministically
- in tests.
+ Every request recorded. Inspect messages, verify tool calls, assert on conversation
+ history. HTTP and programmatic access.
@@ -2019,13 +2065,13 @@
Real-World Usage
without hitting real APIs. The tests cover streaming text, tool calls, and multi-turn
conversations across both v1 and v2 runtimes. See the
test suite
and
fixture files
diff --git a/docs/mcp-mock.html b/docs/mcp-mock.html
new file mode 100644
index 0000000..19d7d9b
--- /dev/null
+++ b/docs/mcp-mock.html
@@ -0,0 +1,291 @@
+
+
+
+
+
+ MCPMock — llmock
+
+
+
+
+
+
+
+
+
+
+
+
+
MCPMock
+
+ Mock MCP (Model Context Protocol) server for testing tool integrations. Implements the
+ Streamable HTTP transport with JSON-RPC dispatch, session management, and full
+ tools/resources/prompts support.
+
+
+
Quick Start
+
+
+ Standalone mode typescript
+
+
import { MCPMock } from"@copilotkit/llmock";
+
+const mcp = new MCPMock();
+
+mcp.addTool({ name: "search", description: "Search the web" });
+mcp.onToolCall("search", (args) => {
+ return`Results for: ${(args as { query: string }).query}`;
+});
+
+const url = await mcp.start();
+// Point your MCP client at `url`
+
+
+
Mounted Mode
+
+ Mount MCPMock onto an LLMock server to share a single port with LLM mocking and other
+ services:
+
+ MCPMock implements full session management per the MCP Streamable HTTP spec. Each
+ initialize request creates a new session, and the session ID is returned via
+ the Mcp-Session-Id header. All subsequent requests must include this header.
+
+
+
+
+
+
Method
+
Description
+
+
+
+
+
initialize
+
Creates session, returns capabilities and session ID
+
+
+
tools/list
+
Lists all registered tools
+
+
+
tools/call
+
Calls a tool by name with arguments
+
+
+
resources/list
+
Lists all registered resources
+
+
+
resources/read
+
Reads a resource by URI
+
+
+
prompts/list
+
Lists all registered prompts
+
+
+
prompts/get
+
Gets a prompt by name with arguments
+
+
+
ping
+
Returns empty object (health check)
+
+
+
DELETE /
+
Destroys a session
+
+
+
+
+
Inspection
+
+
+ Inspection API typescript
+
+
mcp.health(); // { status: "ok", tools: 2, resources: 1, prompts: 0, sessions: 1 }
+mcp.getSessions(); // Map of active sessions
+mcp.getRequests(); // Journal entries (when mounted with shared journal)
+mcp.reset(); // Clears all tools, resources, prompts, and sessions
+ Mount additional mock services onto a running LLMock server. All services share one port,
+ one health endpoint, and one request journal — no port juggling, no service
+ discovery.
+
+
+
Mountable Interface
+
+ Any object that implements the Mountable interface can be mounted onto
+ LLMock. The interface requires a single method:
+
+ Mount a Mountable service at a path prefix. Requests matching the prefix are
+ forwarded to the service with the prefix stripped.
+
+
+
+
mount() API typescript
+
const llm = new LLMock({ port: 5555 });
+
+llm.mount("/mcp", mcpMock); // MCP tools at /mcp
+llm.mount("/a2a", a2aMock); // A2A agents at /a2a
+
+await llm.start();
+// All protocols accessible on port 5555
+
+
+
Path Stripping
+
+ When a request arrives at a mounted path, the prefix is stripped before the service sees
+ it. For example, a request to /mcp/tools/list arrives at the MCP service with
+ pathname /tools/list.
+
+
+
+
+
+
Incoming Request
+
Mount Prefix
+
Service Sees
+
+
+
+
+
POST /mcp/tools/list
+
/mcp
+
/tools/list
+
+
+
POST /a2a/agents/run
+
/a2a
+
/agents/run
+
+
+
GET /mcp
+
/mcp
+
/
+
+
+
+
+
WebSocket Upgrade Support
+
+ If a mounted service implements handleUpgrade(), WebSocket upgrade requests
+ matching the mount prefix are forwarded to it. This enables WebSocket-based protocols like
+ MCP over StreamableHTTP or custom agent protocols.
+
+
+
Unified Health Endpoint
+
+ The GET /health endpoint aggregates health from all mounted services. Each
+ service that implements health() is included in the response:
+
+ createMockSuite() provides a unified lifecycle for LLMock and all mounted
+ services. It creates the server, mounts services, and returns start() /
+ stop() / reset() methods that manage everything together.
+
+ Record once against real APIs, then replay from fixtures for fast, offline development.
+
+
+
+
Record then replay bash
+
# First run: record real API responses
+llmock --record --provider-openai https://api.openai.com -f ./fixtures
+
+# Subsequent runs: replay from recorded fixtures
+llmock -f ./fixtures
+
+
+
CI Pipeline Workflow
+
+ Use the Docker image in CI with --strict mode to ensure every request matches
+ a recorded fixture. No API keys needed, no flaky network calls.
+