Skip to content

Retry Google 499 CANCELLED as transient#2566

Merged
hiroshinishio merged 1 commit intomainfrom
wes
Apr 21, 2026
Merged

Retry Google 499 CANCELLED as transient#2566
hiroshinishio merged 1 commit intomainfrom
wes

Conversation

@hiroshinishio
Copy link
Copy Markdown
Collaborator

@hiroshinishio hiroshinishio commented Apr 21, 2026

Summary

  • Added getattr(err, "code", None) == 499 to is_transient_error so Google's ClientError: 499 CANCELLED flows through the existing linear-backoff transient-retry path (2s/4s/6s, 3 attempts)
  • Reproduced from the 2026-04-20T17:36:33 gitautoai/website incident (AGENT-3JX/3JY/3K0/3JZ): Google's backend cancelled the stream server-side during the same free-tier overload window that produced the 429 cluster 1 hour earlier, but without a Retry-After hint
  • Added two tests using the real google_errors.ClientError shape: positive for 499 CANCELLED, negative for 400 INVALID_ARGUMENT (to prove we don't retry real client bugs)
  • Minor cleanup: two partial-assertion fixups from the prior rate-limit PR and a sentry-cli doc update in CLAUDE.md

Social Media Post (GitAuto)

Google 499 CANCELLED errors now retry instead of bubbling to Sentry

  • is_transient_error recognizes Gemini's server-side stream cancellations as retryable
  • Linear backoff via the existing transient-retry budget (3 attempts, 2s/4s/6s), no new config
  • Paired with the 429 retry shipped last week, free-tier overload windows now self-heal

Social Media Post (Wes)

Gemini's free-tier has two flavors of overload: 429 with a "retry in 60s" hint, and 499 CANCELLED with no hint at all. Same Sentry cluster, same repo, 1 hour apart. Four lines plus two tests to make 499 take the transient-retry path we already had. Overload windows self-heal now.

Google's GenAI backend emits 499 CANCELLED when it closes the stream server-side, observed in the 2026-04-20 free-tier overload window on gitautoai/website (AGENT-3JX/3JY/3K0/3JZ). No Retry-After hint is provided, so route through is_transient_error's linear backoff rather than the rate-limit path. Also updates CLAUDE.md to simplify the sentry-cli workflow and roll up two minor partial-assertion fixups from the prior rate-limit work.
@hiroshinishio hiroshinishio self-assigned this Apr 21, 2026
@hiroshinishio hiroshinishio merged commit 53d2cb2 into main Apr 21, 2026
1 check passed
@hiroshinishio hiroshinishio deleted the wes branch April 21, 2026 02:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant