Skip to content

feat(spec): add renderTiming to McpUiToolMeta for deferred View rendering#553

Open
netanelavr wants to merge 1 commit intomodelcontextprotocol:mainfrom
netanelavr:feat/render-timing
Open

feat(spec): add renderTiming to McpUiToolMeta for deferred View rendering#553
netanelavr wants to merge 1 commit intomodelcontextprotocol:mainfrom
netanelavr:feat/render-timing

Conversation

@netanelavr
Copy link

Summary

Adds a new renderTiming field to McpUiToolMeta that lets servers declare when a View should appear in the conversation, addressing a gap in the spec where hosts have no standardized way to distinguish between Views that should render immediately vs. after the agent finishes its turn.

Problem

The current spec defines displayMode (inline / fullscreen / pip) for visual layout, but has no concept of temporal presentation — i.e., when to show the View. In agentic workflows where the LLM makes multiple sequential tool calls, some Views (e.g., "Apply to Site", confirmation dialogs) should only appear after the agent is done reasoning, to prevent premature user interaction.

Today, hosts that need this behavior must invent proprietary metadata fields. This PR standardizes the pattern.

Solution

New type and field on McpUiToolMeta:

type McpUiRenderTiming = "inline" | "end-of-turn";

interface McpUiToolMeta {
  resourceUri?: string;
  visibility?: McpUiToolVisibility[];
  renderTiming?: McpUiRenderTiming;  // NEW
}
  • "inline" (default) — render the View as soon as the tool returns
  • "end-of-turn" — defer rendering until the agent's turn is complete (no more tool calls)

Design decisions

  • Server-declared hint: The server has domain knowledge about whether its View needs deferred rendering; the host respects it but MAY ignore it
  • Orthogonal to displayMode: Timing and layout are independent concerns — a View can be end-of-turn + fullscreen
  • Backward compatible: Optional field, defaults to "inline", existing tools are unaffected
  • Extensible: String union allows future values (e.g., "on-user-action") without breaking changes

Prior art

  • Elementor's Angie has shipped this pattern in production (as a vendor-specific _meta.ui.displayMode field with "inline" / "end-of-turn" values). This PR standardizes the concept.
  • Related to the deferred _meta["openai/toolInvocation/invoking"] / invoked fields tracked in Protocol discrepancies between MCP Apps and Apps SDK #201, though those are status text rather than timing control.

Changes

  • src/spec.types.ts — add McpUiRenderTiming type and renderTiming field to McpUiToolMeta
  • src/types.ts — re-export new type and schema
  • specification/draft/apps.mdx — document Render Timing section and design decision
  • src/generated/* — auto-regenerated schemas (Zod + JSON Schema + tests)

Test plan

  • npm test — all 121 tests pass
  • npm run build — builds successfully including all examples
  • Schema generation produces correct Zod and JSON Schema for the new type
  • Type-level integration tests verify McpUiRenderTiming round-trips correctly

Made with Cursor

…ring

Add a new `renderTiming` field to `McpUiToolMeta` that lets servers
declare when a View should appear in the conversation:

- "inline" (default): render as soon as the tool returns
- "end-of-turn": defer rendering until the agent's turn is complete

This addresses a gap in the spec where hosts have no standardized way
to know whether a View should be shown immediately or after the agent
finishes its turn. Tools like "Apply to Site" need deferred rendering
to prevent premature user interaction while the agent is still making
additional tool calls.

This is orthogonal to the existing visual `displayMode`
(inline/fullscreen/pip) which controls layout, not timing.

Changes:
- spec.types.ts: add McpUiRenderTiming type and renderTiming field
- types.ts: re-export new type and schema
- specification/draft/apps.mdx: document Render Timing section and
  design decision
- generated/schema.*: auto-regenerated from types

Made-with: Cursor
@idosal
Copy link
Contributor

idosal commented Mar 19, 2026

Thanks @netanelavr ! To understand the gap, could you please provide additional example cases that tool definition doesn't cover? For example, in your current example, I'd imagine the "approval" tool could be forced to be called after the reasoning by requiring the reason argument.

@liady
Copy link
Contributor

liady commented Mar 19, 2026

@netanelavr just to make sure - currently the host renders the view immediately (and doesn't actually wait for the tool result). The decision of what to show inside the view is being done by the view itself, according to the data it gets from the host (i.e no data -> loading state, tool inputs -> stateA, tool result -> stateB).

This mechanism can theoretically be extended so that the host will send a new type of message to signal that it has done reasoning (so that the view can respond to that).
What do you think? This might allow the most accurate visual feedback for the user.

So the view can change according to these lifecycle events:

  • The host decides to use the tool (renders the view)
  • The host calls the tool (streams tool inputs to the view)
  • The host receives the tool response (sends the tool result to the view)
  • The host finishes the agentic reasoning (sends a message to the view)

@netanelavr
Copy link
Author

Thanks @netanelavr ! To understand the gap, could you please provide additional example cases that tool definition doesn't cover? For example, in your current example, I'd imagine the "approval" tool could be forced to be called after the reasoning by requiring the reason argument.

@idosal Thanks for the feedback! Here are a few cases where tool definition alone just doesn't cover the gap:

  • Multi-tool workflows with shared confirmation - Think of a flow like create_widget -> deploy_widget -> validate_widget. The "Publish to Site" view (attached to deploy_widget) should only show when everything is ready, but the server doesn't know the tool order upfront. If the view appears mid-turn, the user might publish before validation is done. We can try guiding it with a reason arg but we end up depending on LLM behavior instead of enforcing intent.

  • Views aren’t tied to a single tool - In my implementation, the view isn't a dedicated "publish_to_site" tool, it can attach to any tool via _meta.ui. For example, a create_widget tool can return both the content and a view to apply it. The tool runs early, but the view should only appear at the end.

Hopefully that makes it clearer.

@netanelavr
Copy link
Author

@netanelavr just to make sure - currently the host renders the view immediately (and doesn't actually wait for the tool result). The decision of what to show inside the view is being done by the view itself, according to the data it gets from the host (i.e no data -> loading state, tool inputs -> stateA, tool result -> stateB).

This mechanism can theoretically be extended so that the host will send a new type of message to signal that it has done reasoning (so that the view can respond to that). What do you think? This might allow the most accurate visual feedback for the user.

So the view can change according to these lifecycle events:

  • The host decides to use the tool (renders the view)
  • The host calls the tool (streams tool inputs to the view)
  • The host receives the tool response (sends the tool result to the view)
  • The host finishes the agentic reasoning (sends a message to the view)

@liady Good question, the lifecycle events approach is interesting, but there's a challenge around timing and positioning.

If the host renders the view immediately (even hidden/loading), it gets inserted at the tool call position. When the agent finishes and the view "activates", it's still stuck in the middle of the conversation.

With the proposed renderTiming hint, the host just waits until the loop ends, so the view naturally shows in the right place. With lifecycle events alone, you'd still need either the view to move itself in the DOM or the host to reposition it when it's done reasoning.

That said, lifecycle events are still valuable since they enable more advanced UI behavior. I see them as complementary, not a replacement. In fact, I've already implemented the reverse direction (view -> host signals) on our end for cases where the view needs to trigger host behavior without appearing in the chat.

Would it make sense to have both? renderTiming for the simple case, and lifecycle events for views that want more control over their presentation?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants