fix: allow 429 RateLimitError to trigger MidStreamFallbackError in streaming by CSteigstra · Pull Request #22297 · BerriAI/litellm

CSteigstra · 2026-02-27T16:47:21Z

Summary

PR #18698 introduced a blanket 4xx filter in CustomStreamWrapper.__anext__() that prevents all 400-499 status codes from being wrapped in MidStreamFallbackError. While correct for non-retriable client errors (400, 401, 403, 404), 429 (rate-limit) is fundamentally transient and should trigger the Router's fallback system to switch to a different model group.

# Works for 5xx errors (503, 529):
# → MidStreamFallbackError → Router catches → falls back ✅

# Broken for 429:
# → RateLimitError raised directly → no fallback ❌

Changes

litellm/litellm_core_utils/streaming_handler.py: Exclude 429 from the 4xx filter in __anext__() so rate-limit errors raise MidStreamFallbackError instead of RateLimitError directly. Other 4xx errors (400, 401, 403, 404) still raised directly as before.
litellm/router.py: When MidStreamFallbackError has is_pre_first_chunk=True or empty generated_content (e.g. 429 before any tokens), skip the continuation prompt and retry with original messages. Previously this always appended a "continue from this text:" system message with empty content, wasting ~100 tokens.
Tests: Added test_vertex_streaming_rate_limit_triggers_midstream_fallback (streaming handler) and test_acompletion_streaming_iterator_pre_first_chunk_skips_continuation (router). Updated existing edge case test for new empty-content behavior.

Fixes #22296
Relates to #20870, #18229, #8648, #6532

…reaming PR BerriAI#18698 introduced a blanket 4xx filter that prevents all 400-499 status codes from being wrapped in MidStreamFallbackError during async streaming. While this is correct for non-retriable client errors (400, 401, 403, 404), 429 (rate-limit) is fundamentally transient and should trigger the Router's fallback system to switch to a different model group. Changes: 1. streaming_handler.py: Exclude 429 from the 4xx filter in __anext__() so rate-limit errors raise MidStreamFallbackError instead of RateLimitError directly. 2. router.py: When MidStreamFallbackError has is_pre_first_chunk=True or empty generated_content (e.g. 429 before any tokens), skip the continuation prompt and retry with original messages. Previously this always appended a "continue from this text:" system message with empty content, wasting ~100 tokens. Fixes BerriAI#20870 Relates to BerriAI#18229, BerriAI#8648, BerriAI#6532

vercel · 2026-02-27T16:47:26Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
litellm	Ready	Preview, Comment	Feb 27, 2026 4:48pm

greptile-apps · 2026-02-27T16:52:03Z

Greptile Summary

This PR fixes 429 (rate-limit) errors during streaming being raised directly as RateLimitError instead of being wrapped in MidStreamFallbackError, which prevented the Router's fallback system from switching to a different model group. The fix excludes 429 from the blanket 4xx filter introduced in PR #18698 and adds an optimization to skip the continuation prompt when no content was generated before the error.

streaming_handler.py: Added and mapped_status_code != 429 to both 4xx checks so rate-limit errors flow through to MidStreamFallbackError wrapping, consistent with _should_retry(429) returning True
router.py: When is_pre_first_chunk=True or generated_content is empty, retries with original messages instead of appending a wasteful continuation prompt with empty content
Behavioral change: The empty-content optimization applies to all MidStreamFallbackError cases (not just 429), improving token efficiency for any error that occurs before content is generated
Tests: Two new mock-only regression tests cover the 429-to-MidStreamFallbackError conversion and the pre-first-chunk message handling; one existing edge-case test updated for the new empty-content behavior

Confidence Score: 4/5

This PR is safe to merge — the changes are minimal, well-targeted, and backed by regression tests.
Score of 4 reflects: (1) the streaming_handler.py change is a simple, correct two-line addition that aligns with existing _should_retry() semantics; (2) the router.py change is a sensible optimization with a minor behavioral change for all empty-content MidStreamFallbackErrors (not just 429), but the change is clearly an improvement; (3) tests are mock-only and comprehensive; (4) downstream compatibility verified — _should_retry(429) returns True, and MidStreamFallbackError with status_code=429 flows correctly through the fallback system. Docked one point because the behavioral change to empty-content handling is broader than the PR title implies.
Pay attention to litellm/router.py — the empty-content condition change affects all MidStreamFallbackErrors, not just 429 rate-limit errors.

Important Files Changed

Filename	Overview
litellm/litellm_core_utils/streaming_handler.py	Excludes 429 from the 4xx filter so rate-limit errors are wrapped in MidStreamFallbackError instead of raised directly. Minimal, targeted change consistent with _should_retry() semantics.
litellm/router.py	Adds pre-first-chunk / empty-content check to skip continuation prompt and use original messages. Behavioral change for all empty-content MidStreamFallbackErrors, not just 429. Logic is sound and avoids wasting tokens.
tests/test_litellm/litellm_core_utils/test_streaming_handler.py	Adds mock-only regression test for 429 rate-limit triggering MidStreamFallbackError. No real network calls. Follows existing test patterns.
tests/test_litellm/test_router.py	Adds mock-only regression test for pre-first-chunk skip behavior and updates existing edge case test for new empty-content handling. No real network calls.

Sequence Diagram

sequenceDiagram
    participant Client
    participant Router
    participant StreamWrapper as CustomStreamWrapper
    participant Provider as LLM Provider

    Client->>Router: acompletion(stream=True)
    Router->>StreamWrapper: iterate stream
    StreamWrapper->>Provider: make_call()
    Provider-->>StreamWrapper: 429 RateLimitError

    Note over StreamWrapper: Before fix: raise RateLimitError directly ❌
    Note over StreamWrapper: After fix: 429 excluded from 4xx filter

    StreamWrapper-->>Router: MidStreamFallbackError(is_pre_first_chunk=True)
    
    alt is_pre_first_chunk or empty generated_content
        Note over Router: Use original messages (skip continuation prompt)
    else has generated content
        Note over Router: Append continuation prompt + assistant prefix
    end

    Router->>Router: async_function_with_fallbacks_common_utils()
    Router->>Provider: Fallback to next model group
    Provider-->>Router: Successful response
    Router-->>Client: Stream fallback response ✅

_{Last reviewed commit: 257f29a}

greptile-apps

_{4 files reviewed, no comments}

_{Edit Code Review Agent Settings | Greptile}

CSteigstra · 2026-02-27T17:30:20Z

Some failing checks. Can have a look later.

CSteigstra · 2026-02-27T17:43:40Z

Some failing checks. Can have a look later.

Okay these are flaky tests unrelated to this PR.

vercel bot deployed to Preview February 27, 2026 16:48 View deployment

greptile-apps bot reviewed Feb 27, 2026

View reviewed changes

CSteigstra marked this pull request as draft February 27, 2026 17:29

CSteigstra marked this pull request as ready for review February 27, 2026 17:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: allow 429 RateLimitError to trigger MidStreamFallbackError in streaming#22297

fix: allow 429 RateLimitError to trigger MidStreamFallbackError in streaming#22297
CSteigstra wants to merge 1 commit intoBerriAI:mainfrom
CSteigstra:fix/429-streaming-midstream-fallback

CSteigstra commented Feb 27, 2026

Uh oh!

vercel bot commented Feb 27, 2026 •

edited

Loading

Uh oh!

greptile-apps bot commented Feb 27, 2026

Important Files Changed

Uh oh!

greptile-apps bot left a comment

Uh oh!

CSteigstra commented Feb 27, 2026

Uh oh!

CSteigstra commented Feb 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

CSteigstra commented Feb 27, 2026

Summary

Changes

Uh oh!

vercel bot commented Feb 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

greptile-apps bot commented Feb 27, 2026

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Sequence Diagram

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

CSteigstra commented Feb 27, 2026

Uh oh!

CSteigstra commented Feb 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

vercel bot commented Feb 27, 2026 •

edited

Loading