PostgreSQL connections silently dropped by RDS due to missing default idle connection lifetime config

## Description

When deploying LiteLLM (v1.81.9) on Kubernetes with an AWS RDS PostgreSQL instance, intermittent connection errors occur:

```
{"level":"ERROR","fields":{"message":"Error in PostgreSQL connection: Error { kind: Closed, cause: None }"},"target":"quaint::connector::postgres"}
```

This manifests as failed API requests that succeed on retry (typically after few attempts), causing visible latency spikes (P95/P99) and ~50% server error rates under moderate load.

## Root Cause

RDS silently drops idle TCP connections after a period of inactivity. Quaint (Prisma's database driver) holds connections in its pool without validating them, and hands a dead connection to the next incoming request. The request fails with `kind: Closed`, retries on a fresh connection, and succeeds.

LiteLLM does not set a default value for `max_idle_connection_lifetime` in the DATABASE_URL, meaning connections can sit idle indefinitely and be silently dropped by RDS before LiteLLM recycles them.

## Solution

Adding the following parameters to the `DATABASE_URL` resolves the issue completely:

```
postgresql://user:password@host:5432/dbname?max_idle_connection_lifetime=60&socket_timeout=10
```

- **`max_idle_connection_lifetime=60`** — quaint proactively closes connections idle for 60 seconds, before RDS drops them
- **`socket_timeout=10`** — if a stale connection slips through, the request fails fast rather than hanging

These are quaint-native URL parameters, confirmed supported in quaint's PostgreSQL connector.

## Suggestion

LiteLLM should set a sensible default for `max_idle_connection_lifetime` when initialising the Prisma/quaint connection, given that managed cloud databases (RDS, Cloud SQL, Azure Database) routinely drop idle connections. Leaving it unconfigured means any LiteLLM deployment on a managed PostgreSQL instance will hit this error without a clear path to resolution.

A default of `60` seconds for `max_idle_connection_lifetime` would prevent this for most cloud deployments without meaningful performance impact.

## Environment

- LiteLLM version: 1.81.9
- Deployment: Kubernetes (2 replicas)
- Database: AWS RDS PostgreSQL (db.t4g.small)
- Default connection pool: 10 per pod (20 total)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

PostgreSQL connections silently dropped by RDS due to missing default idle connection lifetime config #22289

Description

Root Cause

Solution

Suggestion

Environment

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

PostgreSQL connections silently dropped by RDS due to missing default idle connection lifetime config #22289

Description

Description

Root Cause

Solution

Suggestion

Environment

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions