feat(logs) : Update export items endpoint to respect sampling & routing decision #7907
Conversation
Update the export logic to respect routing decision to avoid scanning too much data and avoid timeouts
|
In the PR description, could you describe the algo? |
| elif is_flex and routed is not None and routed.start_timestamp.seconds > orig_start: | ||
| next_token = ExportTraceItemsPageToken( | ||
| window_start_sec=orig_start, | ||
| window_end_sec=routed.start_timestamp.seconds, | ||
| ).to_protobuf() |
There was a problem hiding this comment.
Bug: The pagination token's time window (window_start_sec, window_end_sec) is ignored on subsequent requests. The query re-uses the original time window, breaking pagination logic.
Severity: HIGH
Suggested Fix
When a page_token is present and contains window_start_sec and window_end_sec, use these values to constrain the query's time window. This ensures the query respects the time slice specified by the pagination token, rather than re-calculating it from the original request.
Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent. Verify if this is a real issue. If it is, propose a fix; if not, explain why it's
not valid.
Location: snuba/web/rpc/v1/endpoint_export_trace_items.py#L589-L593
Potential issue: The code generates a page token containing a specific time window
(`window_start_sec`, `window_end_sec`) for the next page of results. However, when a
subsequent request uses this token, the time window values are deserialized but never
used to constrain the new query. Instead, the query's time window is re-calculated from
the original request metadata. This causes the pagination logic to fail, as it ignores
the intended time slice from the token, leading to re-scanning the same data or skipping
data entirely. The fields `window_start_sec` and `window_end_sec` are effectively dead
data after being read from the token.
Did we get this right? 👍 / 👎 to inform future reviews.
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit b46663e. Configure here.
| meta.CopyFrom(in_msg.meta) | ||
| meta.start_timestamp.CopyFrom(routing_decision.time_window.start_timestamp) | ||
| meta.end_timestamp.CopyFrom(routing_decision.time_window.end_timestamp) | ||
| return meta |
There was a problem hiding this comment.
Page token's time window is never applied to query
High Severity
The ExportTraceItemsPageToken encodes window_start_sec/window_end_sec but these values are never used when building the next query. _export_query_meta always derives the time window from routing_decision.time_window (or original in_msg.meta), completely ignoring the incoming page token's window. Similarly, w_start/w_end in _execute are derived from the routing decision, not from the page token. When flex routing emits a 2-filter token to advance to the earlier time slice [orig_start, routed_start), the subsequent request re-queries [routed_start, orig_end) instead, causing an infinite loop.
Additional Locations (1)
Reviewed by Cursor Bugbot for commit b46663e. Configure here.


Update ExportTraceItems endpoint to respect the routing decision so that we can reduce the time window for scanned items based on sampling config. This will help with Timing out of sentry requests.
Pagination logic:
Pagination uses a keyset cursor encoded as a protobuf
PageToken.filter_offset (AndFilter).Every token carries the active [
window_start,window_end) time range. Presence of a cursor is inferred fromlast_seen_item_idbeing non-empty.Two shapes:
(project_id, item_type, timestamp, trace_id, item_id). Used when the page was full and the next request should continue after the last returned row.The keyset condition injected into the query is a tuple comparison:
WHERE (project_id, item_type, timestamp, trace_id, item_id) > (last_seen_project_id, last_seen_item_type, ...)This matches the ORDER BY column order, so ClickHouse can seek directly without scanning already-returned rows.
Next-token decision at end of each request: