-
-
Notifications
You must be signed in to change notification settings - Fork 6k
Description
Bug Report
Description
When running LiteLLM with multiple replicas, MCP server configuration changes made via the API (PUT /v1/mcp/server) or UI are only reflected on the pod that handled the request. Other pods retain stale in-memory state until restarted.
This affects all MCP server properties — available_on_public_internet, transport, url, auth_type, allowed_tools, etc.
Models do not have this issue because they have a periodic add_deployment job that re-syncs from DB every 30 seconds. MCP servers have no equivalent polling job — they are only loaded from DB at startup (_init_mcp_servers_in_db).
Steps to Reproduce
- Deploy LiteLLM with 2+ replicas,
store_model_in_db: true, DB connected - Register an MCP server via API or UI (e.g.,
POST /v1/mcp/server) - All pods load the server at startup — works correctly
- Update the server via API or UI (e.g.,
PUT /v1/mcp/serverto changeavailable_on_public_internet) - Subsequent requests hit different pods — some see the update, others don't
Expected Behavior
All pods should reflect MCP server config changes within a reasonable interval (e.g., 30s, similar to model deployments).
Actual Behavior
Only the pod that handled the update request reflects the change. Other pods serve stale config until restarted.
Workaround
Restart all pods after any MCP server configuration change:
kubectl rollout restart deployment/litellm-deployment -n litellm
Root Cause
mcp_server_manager.update_server()updates the in-memory registry on the local pod only- No periodic job exists to re-sync MCP servers from DB (unlike
add_deploymentfor models atproxy_server.py:5155) _init_mcp_servers_in_db()only runs at startup
Environment
- LiteLLM version: 1.81.12
- Deployment: GKE, 5 replicas behind load balancer
store_model_in_db: true- Database: PostgreSQL