|
| 1 | +--- |
| 2 | +slug: httpx-cache-eviction-incident |
| 3 | +title: "Incident Report: Cache Eviction Closes In-Use httpx Clients" |
| 4 | +date: 2026-02-27T10:00:00 |
| 5 | +authors: |
| 6 | + - name: Ryan Crabbe |
| 7 | + title: Performance Engineer, LiteLLM |
| 8 | + url: https://www.linkedin.com/in/ryan-crabbe-0b9687214 |
| 9 | + - name: Ishaan Jaff |
| 10 | + title: "CTO, LiteLLM" |
| 11 | + url: https://www.linkedin.com/in/reffajnaahsi/ |
| 12 | + image_url: https://pbs.twimg.com/profile_images/1613813310264340481/lz54oEiB_400x400.jpg |
| 13 | + - name: Krrish Dholakia |
| 14 | + title: "CEO, LiteLLM" |
| 15 | + url: https://www.linkedin.com/in/krish-d/ |
| 16 | + image_url: https://pbs.twimg.com/profile_images/1298587542745358340/DZv3Oj-h_400x400.jpg |
| 17 | +tags: [incident-report, caching, stability] |
| 18 | +hide_table_of_contents: false |
| 19 | +--- |
| 20 | + |
| 21 | +**Date:** February 27, 2026 |
| 22 | +**Duration:** ~6 days (Feb 21 merge -> Feb 27 fix) |
| 23 | +**Severity:** High |
| 24 | +**Status:** Resolved |
| 25 | + |
| 26 | +> **Note:** This fix is available starting from LiteLLM `v1.81.14.rc.2` or higher. |
| 27 | +
|
| 28 | +## Summary |
| 29 | + |
| 30 | +A change to improve Redis connection pool cleanup introduced a regression that closed **httpx clients** that were still actively being used by the proxy. The `LLMClientCache` (an in-memory TTL cache) stores both Redis clients *and* httpx clients under the same eviction policy. When a cache entry expired or was evicted, the new cleanup code called `aclose()`/`close()` on the evicted value which worked correctly for Redis clients, but destroyed httpx clients that other parts of the system still held references to and were actively using for LLM API calls. |
| 31 | + |
| 32 | +**Impact:** Any proxy instance that hit the cache TTL (default 10 minutes) or capacity limit (200 entries) would have its httpx clients closed out from under it, causing requests to LLM providers to fail with connection errors. |
| 33 | + |
| 34 | +--- |
| 35 | + |
| 36 | +## Background |
| 37 | + |
| 38 | +`LLMClientCache` extends `InMemoryCache` and is used to cache SDK clients (OpenAI, Anthropic, etc.) to avoid re-creating them on every request. These clients are keyed by configuration + event loop ID. The cache has: |
| 39 | + |
| 40 | +- **Max size:** 200 entries |
| 41 | +- **Default TTL:** 10 minutes |
| 42 | + |
| 43 | +When the cache is full or entries expire, `InMemoryCache.evict_cache()` calls `_remove_key()` to drop entries. |
| 44 | + |
| 45 | +The cached values are a mix of: |
| 46 | +- **Redis/async Redis clients** — owned exclusively by the cache, safe to close on eviction |
| 47 | +- **httpx-backed SDK clients** (OpenAI, Anthropic, etc.) — shared references, still in use by router/model instances |
| 48 | + |
| 49 | +--- |
| 50 | + |
| 51 | +## Root Cause |
| 52 | + |
| 53 | +[PR #21717](https://github.com/BerriAI/litellm/pull/21717) overrode `_remove_key()` in `LLMClientCache` to close async clients on eviction: |
| 54 | + |
| 55 | +<details> |
| 56 | +<summary>Problematic code added in PR #21717</summary> |
| 57 | + |
| 58 | +```python |
| 59 | +class LLMClientCache(InMemoryCache): |
| 60 | + def _remove_key(self, key: str) -> None: |
| 61 | + value = self.cache_dict.get(key) |
| 62 | + super()._remove_key(key) |
| 63 | + if value is not None: |
| 64 | + close_fn = getattr(value, "aclose", None) or getattr(value, "close", None) |
| 65 | + if close_fn and asyncio.iscoroutinefunction(close_fn): |
| 66 | + try: |
| 67 | + asyncio.get_running_loop().create_task(close_fn()) |
| 68 | + except RuntimeError: |
| 69 | + pass |
| 70 | + elif close_fn and callable(close_fn): |
| 71 | + try: |
| 72 | + close_fn() |
| 73 | + except Exception: |
| 74 | + pass |
| 75 | +``` |
| 76 | + |
| 77 | +</details> |
| 78 | + |
| 79 | +The intent was correct for Redis clients — prevent connection pool leaks when cached Redis clients expire. But `LLMClientCache` also stores httpx-backed SDK clients (e.g., `AsyncOpenAI`, `AsyncAnthropic`). These clients: |
| 80 | + |
| 81 | +1. Have an `aclose()` method (inherited from httpx) |
| 82 | +2. Are still held by references elsewhere in the codebase (router, model instances) |
| 83 | +3. Were being closed without any check on whether they were still in use |
| 84 | + |
| 85 | +So when the cache evicted an entry, it would call `aclose()` on an httpx client that was still being used for active LLM requests → closed transport → connection errors. |
| 86 | + |
| 87 | +--- |
| 88 | + |
| 89 | +## The Fix |
| 90 | + |
| 91 | +[PR #22247](https://github.com/BerriAI/litellm/pull/22247) removed the `_remove_key` override entirely: |
| 92 | + |
| 93 | +<details> |
| 94 | +<summary>The fix (PR #22247)</summary> |
| 95 | + |
| 96 | +```diff |
| 97 | + class LLMClientCache(InMemoryCache): |
| 98 | +- def _remove_key(self, key: str) -> None: |
| 99 | +- """Close async clients before evicting them to prevent connection pool leaks.""" |
| 100 | +- value = self.cache_dict.get(key) |
| 101 | +- super()._remove_key(key) |
| 102 | +- if value is not None: |
| 103 | +- close_fn = getattr(value, "aclose", None) or getattr( |
| 104 | +- value, "close", None |
| 105 | +- ) |
| 106 | +- ... |
| 107 | +- |
| 108 | + def update_cache_key_with_event_loop(self, key): |
| 109 | +``` |
| 110 | + |
| 111 | +</details> |
| 112 | + |
| 113 | +The eviction now simply drops the reference and lets Python's GC handle cleanup, which is safe because: |
| 114 | +- httpx clients that are still referenced elsewhere stay alive |
| 115 | +- Unreferenced clients get cleaned up by GC naturally |
| 116 | + |
| 117 | +The other improvements from PR #21717 were kept: |
| 118 | +- **`max_connections` respected for URL-based Redis configs**, previously silently dropped |
| 119 | +- **`disconnect()` now closes both sync and async Redis clients**, sync client was previously leaked |
| 120 | +- **Connection pool passthrough**, when a pool is provided with a URL config, it's used directly instead of creating a duplicate |
| 121 | + |
| 122 | +--- |
| 123 | + |
| 124 | +## Remediation |
| 125 | + |
| 126 | +| Action | Status | Code | |
| 127 | +|--------|--------|------| |
| 128 | +| Remove `_remove_key` override that closes shared clients on eviction | ✅ Done | [PR #22247](https://github.com/BerriAI/litellm/pull/22247) | |
| 129 | +| Add e2e test: evicted client still usable (capacity) | ✅ Done | [PR #22313](https://github.com/BerriAI/litellm/pull/22313) | |
| 130 | +| Add e2e test: expired client still usable (TTL) | ✅ Done | [PR #22313](https://github.com/BerriAI/litellm/pull/22313) | |
| 131 | + |
| 132 | +The e2e tests go through `get_async_httpx_client()` the same code path the proxy uses in production and assert the client is still functional after eviction. These run in CI on every PR against `main`. If anyone modifies `LLMClientCache` eviction behavior, overrides `_remove_key`, or adds any form of client cleanup on eviction, these tests will fail regardless of the implementation approach. |
0 commit comments