feat(sdk): Communicate Mode SDK (on_relay) for Python and TypeScript#618
feat(sdk): Communicate Mode SDK (on_relay) for Python and TypeScript#618khaliqgant wants to merge 11 commits intomainfrom
Conversation
Implement the Connect SDK spec — a new "Communicate Mode" that lets any framework agent join Relaycast with a single on_relay() call. Python SDK: - Relay core (lazy WebSocket, send/post/reply/inbox/agents/close) - RelayTransport (HTTP + WS, auto-reconnect, exponential backoff) - 6 framework adapters: OpenAI Agents, Claude SDK, Google ADK, Agno, Swarms, CrewAI - Tier 1 (Push) adapters inject messages mid-execution via hooks/callbacks - Tier 2 (Poll) adapters surface messages at tool-call boundaries - 96 tests (unit + integration + cross-framework) TypeScript SDK: - Relay core, transport, types mirroring Python API - 2 framework adapters: Pi (session.steer), Claude SDK (PostToolUse hook) - 16 tests (unit + integration + cross-framework) Docs: - communicate.mdx overview + 7 per-framework guides - Plain markdown mirrors (docs-sync rule) - Updated introduction.mdx with Communicate Mode section Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Resolves npm ci failure due to missing @sinclair/typebox@0.34.48 in lock file. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add subpath exports for ./communicate/adapters/pi and ./communicate/adapters/claude-sdk so users can import directly - Add withRelay as alias for onRelay (spec references both names) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add ping/pong handling to both Python and TypeScript transports (server pings were silently ignored, causing connection drops) - Fix OpenAI Agents module check: "agents" not "openai_agents" (pip package imports as `agents`, not `openai_agents`) - Use startswith() instead of `in` for module matching per spec - Remove Claude SDK from Python auto-detect (spec says import adapter directly since it uses query options, not agent objects) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
CRITICAL: - Make communicate import lazy in __init__.py (was crashing SDK users without aiohttp installed via eager import chain) - Guard aiohttp import with helpful ImportError message - Standardize Claude SDK adapter signature to (options, relay) matching other adapters, with relay optional HIGH: - Fix OpenAI Agents import: `from agents import function_tool` (not `from openai_agents`) — package imports as `agents` - Fix CrewAI import: `from crewai.tools import tool` (not `from langchain_core.tools`) — wrong dependency - Make relay param optional in all adapters with auto-creation - Add ImportError messages for missing framework packages - Fix instructions_wrapper to accept *args, **kwargs (OpenAI Agents and Agno pass (ctx, agent) to instructions callable) MEDIUM: - Add readonly to TypeScript Message interface fields - Add per-framework optional dependency groups to pyproject.toml Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Python SDK: - Fix all 5 adapter compliance issues (prepend inbox, await async callables, Content parts for ADK, dynamic crewai backstory, claude_sdk signature with explicit name param) - Fix test import mismatches (agents vs openai_agents, crewai.tools vs langchain_core.tools, claude_sdk positional arg order) - Add fallback HTTP polling when WebSocket fails - Fix message routing "both" case (callbacks + buffer) - Conditional __all__ exports for communicate module - Add auto-detect on_relay() tests - Add pytest-cov to dev deps TypeScript SDK: - Add transport.test.ts with 19 tests (HTTP, WebSocket, errors) - Add JSDoc to all public functions in core, types, adapters - Replace any types with proper type guards in index.ts - Reduce Pi adapter from 129→101 lines via data-driven tool defs Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Fix claude_sdk adapter argument order: on_relay(options, relay, *, name) to match other adapters' (agent, relay) pattern - Fix return type annotation: callable -> Callable[[], None] in core.py - URL-encode WebSocket token in transport.py to prevent connection failures - Add retry for aiohttp.ClientError in HTTP transport (was only retrying 5xx) - Add Relay.peek() method and use it in OpenAI Agents/Agno instructions wrappers to avoid draining inbox and starving the relay_inbox tool - Remove transport re-export from TS communicate/index.ts (internal detail) - Improve error message for TS onRelay() auto-detection failures - Add SIGTERM/SIGINT handlers for autoCleanup in TS core.ts - Use thread-based fallback in _run_sync for async context compatibility - Fix @agent-relay/config version: 3.1.23 -> 3.2.1 in SDK package.json - Update spec: withRelay references -> onRelay - Add per-framework guide links to docs/markdown/communicate.md (docs sync) - Update all tests for new signatures Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add connect-dedup in Python Relay._ensure_connected to prevent concurrent double-registration (mirrors TS connectPromise pattern) - Guard communicate import in shim agent_relay/__init__.py with try/except for users without aiohttp - TS handleTransportMessage now always buffers messages (spec: "both" case — callbacks AND inbox), matching Python behavior - Google ADK adapter preserves original before_model_callback return value for short-circuit support - TS transport retries on network errors with exponential backoff, matching Python transport behavior - Exclude completion-pipeline and e2e-owner-review vitest tests from tsconfig.json (in addition to tsconfig.build.json) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Cherry-picked communicate-related changes from 2a9b7ef. - Python _ensure_connected: wrap in outer try/except to resolve _connect_future on fallback exception - TypeScript ensureConnected: reset connectPromise on rejection for retry Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Cherry-picked communicate-related changes from 65cd6cf. - TS onRelay: empty object {} now falls through to Claude SDK adapter - Python openai_agents/agno: use inbox() instead of peek() to drain messages - Python core: re-raise RelayConfigError/RelayAuthError before fallback - Python crewai: always call inbox_sync() regardless of event loop state Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Resolve conflicts by accepting main's deletions for restructured docs, preserving feature branch's SDK improvements for communicate adapters, and accepting main's package-lock.json and config files. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
| workspace: config.workspace ?? process.env.RELAY_WORKSPACE, | ||
| apiKey: config.apiKey ?? process.env.RELAY_API_KEY, | ||
| baseUrl: trimTrailingSlashes(config.baseUrl ?? process.env.RELAY_BASE_URL ?? DEFAULT_RELAY_BASE_URL), | ||
| baseUrl: (config.baseUrl ?? process.env.RELAY_BASE_URL ?? DEFAULT_RELAY_BASE_URL).replace(/\/+$/, ''), |
Check failure
Code scanning / CodeQL
Polynomial regular expression used on uncontrolled data High
| await new Promise<void>((resolve) => { | ||
| socket.once('close', () => resolve()); | ||
| socket.close(); | ||
| }); |
There was a problem hiding this comment.
🔴 WebSocket disconnect can hang indefinitely after timeout removal
The disconnect() method previously used Promise.race with a 2-second timeout to prevent hanging if the WebSocket server is unresponsive during the close handshake. The new code awaits the close event with no timeout at all. If the remote server is unresponsive, the ws library's close() sends a close frame and then waits for the TCP connection to fully terminate, which can take minutes (TCP keepalive/timeout). This causes disconnect() — and anything awaiting it like Relay.close() — to hang indefinitely.
Old code with timeout vs new code without
Old code:
await Promise.race([
new Promise<void>((resolve) => {
socket.once('close', () => resolve());
socket.close();
}),
new Promise<void>((resolve) => setTimeout(resolve, 2_000)),
]);New code:
await new Promise<void>((resolve) => {
socket.once('close', () => resolve());
socket.close();
});Callers like Relay.close() (packages/sdk/src/communicate/core.ts:118) directly await this.transport.disconnect(), so the hang propagates to user code in finally blocks and cleanup paths.
| await new Promise<void>((resolve) => { | |
| socket.once('close', () => resolve()); | |
| socket.close(); | |
| }); | |
| await Promise.race([ | |
| new Promise<void>((resolve) => { | |
| socket.once('close', () => resolve()); | |
| socket.close(); | |
| }), | |
| new Promise<void>((resolve) => setTimeout(resolve, 2_000)), | |
| ]); |
Was this helpful? React with 👍 or 👎 to provide feedback.
Summary
on_relay) enabling agents to receive tasks via relay messagingwithRelayaliasCommits
on_relaySDK for Python and TypeScriptwithRelayaliasTest plan
@agent-relay/sdk/communicate)🤖 Generated with Claude Code