Skip to content

Allow simultaneous inputs on AIChat#934

Draft
deathbyknowledge wants to merge 3 commits intomainfrom
resumability
Draft

Allow simultaneous inputs on AIChat#934
deathbyknowledge wants to merge 3 commits intomainfrom
resumability

Conversation

@deathbyknowledge
Copy link
Member

@deathbyknowledge deathbyknowledge commented Feb 18, 2026

Fixes #24: AIChatAgent now handles multiple messages arriving simultaneously by queuing them and processing one at a time.

Problem

When multiple chat messages arrived before the first response completed, the onChatMessage → _reply cycles would race, causing:

  • Interleaved streaming responses
  • Potential message corruption
  • Unpredictable behavior

Solution

Added a request queue that serializes chat message processing:

  • _chatRequestQueue — Array of pending work functions
  • _chatRequestInFlight — Guard to prevent concurrent processing
  • _enqueueChatRequest() — Adds work to queue, kicks off drain if idle
  • _drainChatRequestQueue() — Processes queue items sequentially

Changes

  • Queue mechanism for serializing onChatMessage calls
  • Tool result continuations now use the queue instead of waitForStream() + 500ms timeout hack
  • Tool approval continuations similarly refactored
  • Removed dead _streamCompletionPromise / _streamCompletionResolve code (no longer needed)
  • Queue cleared on destroy() and CF_AGENT_CHAT_CLEAR

@changeset-bot
Copy link

changeset-bot bot commented Feb 18, 2026

🦋 Changeset detected

Latest commit: acf2e6f

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 1 package
Name Type
@cloudflare/ai-chat Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@pkg-pr-new
Copy link

pkg-pr-new bot commented Feb 18, 2026

Open in StackBlitz

npm i https://pkg.pr.new/cloudflare/agents@934
npm i https://pkg.pr.new/cloudflare/agents/@cloudflare/ai-chat@934
npm i https://pkg.pr.new/cloudflare/agents/@cloudflare/codemode@934
npm i https://pkg.pr.new/cloudflare/agents/hono-agents@934

commit: acf2e6f

@deathbyknowledge
Copy link
Member Author

/bonk review pretty please

@ask-bonk
Copy link
Contributor

ask-bonk bot commented Feb 18, 2026

remote: Permission to cloudflare/agents.git denied to github-actions[bot].
fatal: unable to access 'https://github.com/cloudflare/agents/': The requested URL returned error: 403

github run

@ask-bonk
Copy link
Contributor

ask-bonk bot commented Feb 18, 2026

Bonk workflow failed. Check the logs for details.

View workflow run

@deathbyknowledge deathbyknowledge marked this pull request as ready for review February 18, 2026 14:57
@ask-bonk
Copy link
Contributor

ask-bonk bot commented Feb 18, 2026

I now have a thorough understanding of the PR. Let me write up my review.


PR Review: Allow simultaneous inputs on AIChat

Summary

This PR replaces the _streamCompletionPromise/_streamCompletionResolve mechanism with a request queue (_chatRequestQueue + _enqueueChatRequest + _drainChatRequestQueue) that serializes all onChatMessage_reply cycles. This is a solid architectural improvement.

What's Good

  1. The core queue design is clean and correct. The _enqueueChatRequest / _drainChatRequestQueue pattern is a well-understood serialization approach. The while loop with shift() in _drainChatRequestQueue naturally handles items added during processing.

  2. Eliminates the 500ms timeout hack. The old code had a fragile setTimeout(resolve, 500) fallback when _streamCompletionPromise was null (a known race). The queue approach is strictly better because it serializes by construction — no timers needed.

  3. Consistent application. All three call sites (initial chat request, tool result continuation, tool approval continuation) use the same _enqueueChatRequest path, making the control flow uniform and easier to reason about.

  4. Proper cleanup. Queue is cleared on destroy() and CF_AGENT_CHAT_CLEAR, matching the lifecycle of other state.

Issues

1. Fire-and-forget _enqueueChatRequest — uncaught rejections if DO hibernates (Medium)

packages/ai-chat/src/index.ts:433

The chat request handler now calls this._enqueueChatRequest(...) and immediately returns. The _drainChatRequestQueue method is async but its returned promise is never awaited or passed to ctx.waitUntil(). The comment says "the queue drains via ctx.waitUntil" but I don't see that actually happening.

In the old code, the handler returned the result of this._tryCatchChat(...), which meant the Durable Object's WebSocket handler awaited the full reply. Now, the onMessage handler returns undefined while _drainChatRequestQueue runs in the background. If the DO hibernates or the isolate is evicted mid-drain, queued work silently disappears.

The same issue applies to tool result and tool approval continuations — they were already fire-and-forget via .then(), so this isn't a regression there, but the initial chat request path is a regression.

Suggestion: Either:

  • Have _enqueueChatRequest call this.ctx.waitUntil(drainPromise) so the runtime knows work is outstanding, or
  • Return the drain promise from the onMessage handler for the initial chat request case.

2. _drainChatRequestQueue is called but not awaited (Low)

packages/ai-chat/src/index.ts:739

if (!this._chatRequestInFlight) {
  this._drainChatRequestQueue(); // async, not awaited
}

This is intentional (fire-and-forget kickoff), but combined with issue #1, it means there's no back-pressure. If the caller doesn't hold the promise, the only thing keeping the drain alive is the event loop. Consider storing the drain promise so destroy() can await it for graceful shutdown.

3. Stale connection closure in queued callbacks (Low-Medium)

packages/ai-chat/src/index.ts:433-478

The queued callback captures connection from the enclosing onMessage scope. If multiple messages arrive from different connections, each queued callback closes over its respective connection. This is correct for preserving the sender's identity, but worth noting: if a connection disconnects while its request is still queued, the agentContext.run will execute with a stale connection reference. The abortSignal won't fire because cancellation requires an explicit CF_AGENT_CHAT_REQUEST_CANCEL message.

This isn't a bug per se (the old code had the same behavior with a single in-flight request), but the queue makes it more likely since requests can wait longer. Consider checking connection.readyState before executing a queued callback, or wiring up the abort signal to connection close.

4. Missing changeset (Low)

The changeset bot flagged this — this is a behavioral change to @cloudflare/ai-chat (fixes a race condition) and should have a patch changeset.

Nits

  • The in the JSDoc for _enqueueChatRequest (onChatMessage → _reply cycle) was changed from -> in the diff. Either is fine, but keep it consistent with the rest of the codebase (the rest uses -> or plain English).

Missing Test Coverage

There are no tests for the actual queueing behavior — i.e., verifying that two concurrent chat requests are processed sequentially rather than interleaved. The existing client-tools-continuation.test.ts exercises the continuation path but doesn't send overlapping requests. A test that sends two CF_AGENT_USE_CHAT_REQUEST messages back-to-back and asserts both responses complete in order would be valuable.

Verdict

The approach is correct and the code is clean. The main concern is issue #1 — the initial chat request is now fire-and-forget where it wasn't before, which could cause silent work loss on hibernation. If that's addressed (even just a ctx.waitUntil), this is good to merge with a patch changeset.

github run

@elithrar
Copy link
Collaborator

/bonk review pretty please

@ask-bonk
Copy link
Contributor

ask-bonk bot commented Feb 18, 2026

I read the PR diff, changeset file, and key sections of packages/ai-chat/src/index.ts (the queue implementation, destroy method, cancel logic, and onMessage handler) along with the existing test files to understand the full context, then wrote up a review confirming the queue approach is correct while flagging that the ctx.waitUntil comment is misleading since no such call exists, making the drain fire-and-forget and potentially lossy on hibernation.

github run

@threepointone
Copy link
Contributor

May I please have some tests

@threepointone
Copy link
Contributor

should we update the ai-chat example for this? what's the ux like? also is this behaviour alsways desirable / should it be a flag?

@deathbyknowledge
Copy link
Member Author

May I please have some tests

Donezo. Added one to check response order of new messages and one to make sure the queue empties on clear

Tests verify that:
- Multiple simultaneous messages are processed in order without interleaving
- Queue is cleared when chat history is cleared
@mattzcarey
Copy link
Contributor

mattzcarey commented Feb 27, 2026

Matt's AI review:

Critical concerns:

  1. No abort on clear/close — The queue is cleared but the currently in-flight request keeps running. After a CF_AGENT_CHAT_CLEAR, the active request can still write messages post-clear. Need to abort it or gate on a generation token.

  2. Errors are swallowed — Previously _tryCatchChat was awaited directly. Now errors inside queued handlers are only logged in _drainChatRequestQueue and never propagate to the call site. This changes behavior for anything expecting failures to surface (metrics, retries, WS handler lifecycle).

  3. Stream completion timing — The old explicit wait for stream completion was removed. If _reply resolves before the stream is fully persisted, the next queued message could start too early, re-introducing the interleaving bug the queue was intended to prevent. Worth verifying _reply resolves only after stream end/persistence.

  4. Silent message loss on clearthis._chatRequestQueue.length = 0 drops queued requests with no response or error sent back to the client. Could lead to "lost" user inputs with no feedback.

Test gaps:

  • Ordering test only checks done messages, not that partial chunks aren't interleaved — could false-positive if chunks interleave but finish in order
  • Clear test doesn't assert that req-1 was actually dropped or that no response arrives for it after clear
  • Random delay in TestChatAgent could cause flakes — deterministic delays tied to message content would be more robust

@mattzcarey
Copy link
Contributor

mattzcarey commented Feb 27, 2026

looks good otherwise. preemptively stamping but some of the above could be good to finalise.

@threepointone
Copy link
Contributor

should we update the ai-chat example for this? what's the ux like? also is this behaviour alsways desirable / should it be a flag?

Still looking for answers on this

@deathbyknowledge
Copy link
Member Author

should we update the ai-chat example for this? what's the ux like? also is this behaviour alsways desirable / should it be a flag?

My bad I never saw this comment. There are no changes needed to examples because the UX hasn't been changed at all, the changes are only to the AIChat private internals.

Right now if you send 2 messages you'll get interleaved stream chunks in the same stream and the data race produces garbage outputs so I don't think adding a flag for it is worth it.

That being said, we might want to eventually add a flag to decide when queued messages are processed (after the first tool response or after the final LLM response). Right now only final LLM is available so I think that can be done in a different PR?

@threepointone
Copy link
Contributor

hokay. I'll review this later today.

@deathbyknowledge deathbyknowledge marked this pull request as draft March 2, 2026 10:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ai chat agent should handle multiple inputs coming in at the same time

4 participants