Skip to content

Recover browser MCP calls after broken-pipe transport failures#13100

Closed
swordfish444 wants to merge 2 commits intoopenai:mainfrom
swordfish444:codex/mcp-transport-recovery
Closed

Recover browser MCP calls after broken-pipe transport failures#13100
swordfish444 wants to merge 2 commits intoopenai:mainfrom
swordfish444:codex/mcp-transport-recovery

Conversation

@swordfish444
Copy link
Contributor

Summary

  • make MCP browser-tool recovery robust when the transport dies with Transport send error / Broken pipe (not just literal Transport closed)
  • keep one-shot restart + retry behavior in McpConnectionManager
  • add targeted unit coverage for transport-error matching

Why

On macOS, browser MCP transports (notably chrome-devtools and playwright) can die mid-session, surfacing as broken-pipe send errors. The previous matcher only recognized Transport closed, so recovery did not trigger for these common failures.

Changes

  • broadened is_transport_closed_error(...) in
    codex-rs/core/src/mcp_connection_manager.rs to match:
    • transport closed
    • transport send error
    • transport receive error
    • transport errors containing broken pipe / connection reset
  • added tests:
    • is_transport_closed_error_matches_transport_closed
    • is_transport_closed_error_matches_transport_broken_pipe
    • is_transport_closed_error_does_not_match_non_transport_error

Validation

Automated

  • just fmt
  • just fix -p codex-core
  • cargo test -p codex-core is_transport_closed_error
  • cargo test -p codex-core mcp_tool_call_recovers_from_transport_closed

End-to-end (macOS, patched app-server)

Server:

  • ./target/debug/codex app-server --listen ws://127.0.0.1:4322

Chrome DevTools MCP:

  1. first turn tool call succeeds (mcp__chrome-devtools__new_page)
  2. force-close child MCP process (kill -9 10162)
  3. second turn in same thread succeeds without app restart
  4. server log confirms recovery path:
    • MCP transport closed for server chrome-devtools; restarting

Playwright MCP:

  1. first turn tool call succeeds (mcp__playwright__browser_tabs)
  2. force-close child MCP process (kill -9 10154 11782)
  3. second turn in same thread succeeds without app restart
  4. server log confirms recovery path:
    • MCP transport closed for server playwright; restarting

Related: #6649

@swordfish444
Copy link
Contributor Author

Adding additional routing based on issue history + MCP ownership signals: @etraut-openai @mzeng-openai. This PR includes macOS E2E repro + validation for browser MCP transport drops (Chrome DevTools and Playwright), with forced process-kill recovery in-session and no Codex app restart required.

@etraut-openai
Copy link
Collaborator

@swordfish444 Please do not submit PRs. If you want to suggest a new feature, file a feature request.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants