Skip to content

refactor(web-integration): tighten CDP proxy ownership and split helpers#2373

Closed
quanru wants to merge 9 commits intoweb-infra-dev:mainfrom
quanru:refactor/cdp-proxy-ownership
Closed

refactor(web-integration): tighten CDP proxy ownership and split helpers#2373
quanru wants to merge 9 commits intoweb-infra-dev:mainfrom
quanru:refactor/cdp-proxy-ownership

Conversation

@quanru
Copy link
Copy Markdown
Collaborator

@quanru quanru commented Apr 17, 2026

Summary

Architectural follow-up to #2357. Two refactors that the original PR's review surfaced as worth doing but too invasive to fold into the bug-fix PR itself:

  1. Move proxy metadata ownership back to the proxy. killProxy() previously sent SIGTERM and immediately unlinked PROXY_*_FILE, racing with the proxy's own cleanupIfOwned() SIGTERM handler. The race was masked by a defensive "PID file missing → bail" branch added inside cleanupIfOwned(). This PR makes killProxy() only signal + poll for the child to exit (with a 2s grace + SIGKILL fallback), so the proxy is the sole writer/deleter of its own metadata files. The cleanupIfOwned() defensive branch becomes load-bearing only for the SIGKILL fallback path.
  2. Split mcp-tools-cdp.ts into focused modules. It had grown to handle six concerns: CDP URL resolution, proxy lifecycle, proxy metadata IO, cross-command targetId persistence, puppeteer agent construction, and MCP tool schema. Extract two new files:
    • cdp-proxy-manager.ts — proxy lifecycle + metadata IO + URL resolution.
    • cdp-target-store.ts — cross-command targetId persistence.
    • mcp-tools-cdp.ts keeps only WebCdpMidsceneTools and the puppeteer getTargetId() glue.

External surface (WebCdpMidsceneTools) is unchanged.

Dependencies

This PR is stacked on top of #2357. The diff currently includes #2357's commits because that PR has not merged yet. After #2357 lands I will rebase this branch onto main so the diff cleans up to just the two refactor commits below.

Until then the refactor commits to review are:

  • 0f08a80 refactor(web-integration): let CDP proxy own its metadata cleanup
  • e35ff41 refactor(web-integration): split CDP-mode helpers out of mcp-tools-cdp

Test plan

🤖 Generated with Claude Code

BH4HPA and others added 9 commits April 16, 2026 10:57
…DP endpoints

The CDP proxy was reused solely based on process liveness, ignoring which
Chrome endpoint it was connected to. Switching from a local to a remote
CDP endpoint would silently route commands through the wrong browser.

- Add PROXY_UPSTREAM_FILE to record the Chrome endpoint the proxy connects to
- Validate upstream match before reusing a proxy; kill and respawn on mismatch
- Fix cleanupIfOwned() race: skip cleanup when PID file is missing or unreadable,
  preventing a dying proxy from deleting a newly spawned proxy's metadata files

Closes web-infra-dev#2354

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Each CLI command is a separate process that reconnects to Chrome via the
CDP proxy. The proxy kept a single persistent upstream WebSocket, so
Chrome never re-sent targetCreated events for the second client, causing
browser.pages() to return 0 and every command to open a new about:blank.

- Proxy reconnects upstream WebSocket when all downstream clients
  disconnect, resetting Chrome's CDP target discovery state
- Buffer downstream messages during upstream reconnect and flush on open
- Clear pending buffer when all clients disconnect to avoid replaying
  stale commands from already-gone clients
- Persist targetId to a temp file so subsequent commands can locate the
  exact tab opened by `connect`, even among many open tabs
- Clean up targetId file on explicit `disconnect`

Closes web-infra-dev#2355

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…undant permission popup

When all downstream clients disconnected, the proxy eagerly reconnected
its upstream WebSocket to Chrome, triggering a second "Allow remote
debugging" permission popup even though nobody was using the connection.
Defer the reconnect until the next client actually connects.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…t to prevent process hang

The spawned CDP proxy child process's stdout pipe was never destroyed
after reading the endpoint, keeping the parent Node.js event loop alive
and causing the CLI to hang on the first run after Chrome restart.

Fixes web-infra-dev#2368

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Any startup failure of the CDP proxy child process previously collapsed
into a single opaque line in the parent CLI:

  [cdp] proxy failed, falling back to direct connection:
    Error: Proxy exited with code 0 before ready

This hid at least four very different root causes (upstream closed,
upstream error, duplicate-proxy race, startup timeout) behind the same
log, leaving remote CDP failures effectively undiagnosable.

Parent side (mcp-tools-cdp.ts, spawnProxy):
- Stop discarding the child's stderr (stdio 'pipe' for fd 2).
- Keep a bounded 8 KB rolling stderr buffer and attach its tail to
  the error thrown for timeout, exit-before-ready, and spawn-error
  paths. Distinguish exit-by-code from exit-by-signal.
- Destroy both stdout and stderr pipes after the endpoint is read so
  the parent event loop can drain (matches 9d4f186 for stdout).

Child side (cdp-proxy.ts):
- Include the WebSocket close code and reason in the upstream close
  shutdown diagnostic.
- Write an explicit "duplicate proxy detected" line to stderr before
  exiting the duplicate-proxy branch (it previously exited silently).
- Introduce exitWithStderr(): write the diagnostic and wait for its
  drain callback before calling process.exit, with a 500 ms fallback
  timer. On POSIX, stderr-over-pipe is asynchronous, so exit()
  immediately after write() can drop the very diagnostic this patch
  is trying to surface. shutdown() routes through the same helper.

Tests:
- New regression test confirming the duplicate-proxy branch emits
  the stderr line, exits 0, and does not wipe the live proxy's
  metadata files.

Refs web-infra-dev#2369

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…ct Object]"

Many SDK/transport layers reject with plain objects ({code, message},
{error: {message}}, {cause: {message}}). The previous fallback of
String(error) collapsed these to "[object Object]", hiding the root cause
from both console logs and user-facing MCP tool results.

The new getErrorMessage helper walks the common shapes, falls back to
JSON.stringify, and finally to Object.prototype.toString so non-Error
rejections always produce something actionable. Replaces 8 inline
duplicates across tool-generator.ts and base-server.ts.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Code-review followups on PR web-infra-dev#2357:

- mcp-tools-cdp.ts: extract `getTargetId()` helper so the puppeteer
  private-field access (`(page.target() as any)._targetId`) lives in one
  place. Log a debug message when it returns undefined so a future
  puppeteer rename does not silently disable cross-command tab reuse.
- mcp-tools-cdp.ts: `killProxy()` now also calls `cleanupTargetIdFile()`
  so a saved targetId from the previous Chrome cannot leak across an
  upstream switch. Replace bare catch with debug-logging catch so EPERM
  vs ESRCH does not vanish.
- cdp-proxy.ts: refresh the file-header docblock to cover the lazy
  upstream reconnect, the duplicate-proxy exit path, and the
  PROXY_UPSTREAM_FILE bookkeeping. Promote the 500ms exit fallback to a
  named `STDERR_FLUSH_FALLBACK_MS` constant.
- cdp-proxy.test.ts: add an integration test that drives the real
  `getProxyEndpoint()` / `killProxy()` path across an upstream switch,
  without manual file cleanup in between, to guard the duplicate-proxy
  race window.
- mcp-tools-cdp.ts: expose internal helpers via a `__test__` namespace
  for the new test.
`killProxy()` previously sent SIGTERM and immediately unlinked
PROXY_ENDPOINT_FILE / PROXY_PID_FILE / PROXY_UPSTREAM_FILE itself,
racing with the proxy's own `cleanupIfOwned()` SIGTERM handler. The
race was masked by a defensive "PID file missing → bail" branch added
to `cleanupIfOwned()` in web-infra-dev#2357.

Move ownership of the proxy metadata files back to the proxy:

- `killProxy()` now only signals the child (SIGTERM) and polls
  PROXY_PID_FILE for up to 2s, escalating to SIGKILL + a metadata
  sweep if the child is unresponsive.
- The cross-command targetId file still belongs to mcp-tools and is
  cleared up front, regardless of the proxy's state.
- `getProxyEndpoint()` now `await`s `killProxy()` so the next
  `spawnProxy()` cannot race the dying child's duplicate-proxy guard.
`mcp-tools-cdp.ts` had grown to handle six concerns at once: CDP URL
resolution, proxy process lifecycle, proxy metadata IO, cross-command
targetId storage, puppeteer agent construction and the MCP tool schema.
Split the proxy-side and store-side concerns into dedicated modules so
the file is just the tool class plus its puppeteer glue.

- New `cdp-proxy-manager.ts` owns the proxy lifecycle and its on-disk
  metadata: `isProxyAlive`, `readProxyEndpoint`, `readProxyUpstream`,
  `killProxy`, `getProxyEndpoint`, plus the page-level URL resolver.
  All of these are now real module exports rather than file-private
  helpers behind a `__test__` namespace.
- New `cdp-target-store.ts` owns the cross-command targetId persistence
  (`readSavedTargetId`, `saveTargetId`, `cleanupTargetIdFile`).
- `mcp-tools-cdp.ts` keeps `WebCdpMidsceneTools`, the puppeteer
  `getTargetId()` glue and the target-discovery delay constant. Its
  external surface (`WebCdpMidsceneTools`) is unchanged.
- Tests now import from `cdp-proxy-manager.js` directly; the
  `__test__` namespace is gone.
@quanru
Copy link
Copy Markdown
Collaborator Author

quanru commented Apr 17, 2026

Reopening from upstream branch instead of fork (maintainer workflow).

@quanru quanru closed this Apr 17, 2026
@quanru quanru deleted the refactor/cdp-proxy-ownership branch April 17, 2026 10:38
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e35ff41345

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +249 to +252
if (needsUpstreamReconnect && upstream.readyState === WebSocket.OPEN) {
reconnectUpstream();
needsUpstreamReconnect = false;
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Clear reconnect marker when first client consumes it

needsUpstreamReconnect is only reset inside the upstream.readyState === WebSocket.OPEN branch, so it can stay true if a client connects while the upstream is still CONNECTING (for example, after a fast connect/disconnect cycle during reconnect). In that state, a later client connection can trigger reconnectUpstream() again even though there are already active clients, which resets the CDP session mid-run and can drop in-flight commands/events for the connected client(s).

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants