Skip to content

feat(logs) : Update export items endpoint to respect sampling & routing decision #7907

Open
manessaraj wants to merge 3 commits intomasterfrom
sarajmanes/feat/logs-696-update_export_items_endpoint_to_respect_routing_decision
Open

feat(logs) : Update export items endpoint to respect sampling & routing decision #7907
manessaraj wants to merge 3 commits intomasterfrom
sarajmanes/feat/logs-696-update_export_items_endpoint_to_respect_routing_decision

Conversation

@manessaraj
Copy link
Copy Markdown
Contributor

@manessaraj manessaraj commented Apr 27, 2026

Update ExportTraceItems endpoint to respect the routing decision so that we can reduce the time window for scanned items based on sampling config. This will help with Timing out of sentry requests.

Pagination logic:

Pagination uses a keyset cursor encoded as a protobuf PageToken.filter_offset (AndFilter).
Every token carries the active [window_start, window_end) time range. Presence of a cursor is inferred from last_seen_item_id being non-empty.

Two shapes:

  • 2 filters: time window only. Used when advancing to the next flex-time slice (no rows seen yet in that window).
  • 7 filters: time window + 5 last-seen fields (project_id, item_type, timestamp, trace_id, item_id). Used when the page was full and the next request should continue after the last returned row.

The keyset condition injected into the query is a tuple comparison:
WHERE (project_id, item_type, timestamp, trace_id, item_id) > (last_seen_project_id, last_seen_item_type, ...)
This matches the ORDER BY column order, so ClickHouse can seek directly without scanning already-returned rows.
Next-token decision at end of each request:

  • Results filled the page => 7-filter token (same window, last row as cursor)
  • Results exhausted the window AND flex routing narrowed the original range → 2-filter token (advance to the remaining earlier slice)
  • Otherwise, end_pagination = True

Update the export logic to respect routing decision to avoid scanning too much data and avoid timeouts
@linear-code
Copy link
Copy Markdown

linear-code Bot commented Apr 27, 2026

@manessaraj manessaraj changed the title Sarajmanes/feat/logs 696 update export items endpoint to respect routing decision feat(logs) : Update export items endpoint to respect sampling & routing decision Apr 27, 2026
@xurui-c
Copy link
Copy Markdown
Member

xurui-c commented Apr 27, 2026

In the PR description, could you describe the algo?

Comment thread snuba/web/rpc/v1/endpoint_export_trace_items.py
@manessaraj manessaraj marked this pull request as ready for review April 28, 2026 21:05
@manessaraj manessaraj requested review from a team as code owners April 28, 2026 21:05
Comment on lines +589 to +593
elif is_flex and routed is not None and routed.start_timestamp.seconds > orig_start:
next_token = ExportTraceItemsPageToken(
window_start_sec=orig_start,
window_end_sec=routed.start_timestamp.seconds,
).to_protobuf()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: The pagination token's time window (window_start_sec, window_end_sec) is ignored on subsequent requests. The query re-uses the original time window, breaking pagination logic.
Severity: HIGH

Suggested Fix

When a page_token is present and contains window_start_sec and window_end_sec, use these values to constrain the query's time window. This ensures the query respects the time slice specified by the pagination token, rather than re-calculating it from the original request.

Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent. Verify if this is a real issue. If it is, propose a fix; if not, explain why it's
not valid.

Location: snuba/web/rpc/v1/endpoint_export_trace_items.py#L589-L593

Potential issue: The code generates a page token containing a specific time window
(`window_start_sec`, `window_end_sec`) for the next page of results. However, when a
subsequent request uses this token, the time window values are deserialized but never
used to constrain the new query. Instead, the query's time window is re-calculated from
the original request metadata. This causes the pagination logic to fail, as it ignores
the intended time slice from the token, leading to re-scanning the same data or skipping
data entirely. The fields `window_start_sec` and `window_end_sec` are effectively dead
data after being read from the token.

Did we get this right? 👍 / 👎 to inform future reviews.

Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit b46663e. Configure here.

meta.CopyFrom(in_msg.meta)
meta.start_timestamp.CopyFrom(routing_decision.time_window.start_timestamp)
meta.end_timestamp.CopyFrom(routing_decision.time_window.end_timestamp)
return meta
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Page token's time window is never applied to query

High Severity

The ExportTraceItemsPageToken encodes window_start_sec/window_end_sec but these values are never used when building the next query. _export_query_meta always derives the time window from routing_decision.time_window (or original in_msg.meta), completely ignoring the incoming page token's window. Similarly, w_start/w_end in _execute are derived from the routing decision, not from the page token. When flex routing emits a 2-filter token to advance to the earlier time slice [orig_start, routed_start), the subsequent request re-queries [routed_start, orig_end) instead, causing an infinite loop.

Additional Locations (1)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit b46663e. Configure here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants