Skip to content

Litellm health check tokens#22299

Draft
Harshit28j wants to merge 2 commits intomainfrom
litellm_health_check_tokens
Draft

Litellm health check tokens#22299
Harshit28j wants to merge 2 commits intomainfrom
litellm_health_check_tokens

Conversation

@Harshit28j
Copy link
Collaborator

@Harshit28j Harshit28j commented Feb 27, 2026

Relevant issues

Address health check token overconsumption by introducing a configurable limit and a sensible default for non-wildcard models.

Pre-Submission checklist

  • I have Added testing in the tests/litellm/ directory (Added tests/test_litellm/proxy/test_health_check_max_tokens.py)
  • My PR passes all unit tests on make test-unit
  • My PR's scope is as isolated as possible, it only solves 1 specific problem
  • I have requested a Greptile review by commenting @greptileai

CI (LiteLLM team)

  • Branch creation CI run
    Link:
  • CI run for the last commit
    Link:
  • Merge / cherry-pick CI run
    Links:

Type

🆕 New Feature

Changes

  • Optimization: Updated litellm/proxy/health_check.py to default max_tokens: 1 for standard health checks. This prevents health checks (like Azure OpenAI) from generating long responses, saving cost and reducing latency.
  • New Config Setting: Introduced health_check_max_tokens in model_info. Users can now explicitly set the token limit for health checks in their config.yaml.
  • Wildcard Alignment: Updated litellm/litellm_core_utils/health_check_helpers.py to respect the configurable token limit while maintaining a safe default of 10 for wildcard-route models.
  • Testing: Added unit tests in tests/test_litellm/proxy/test_health_check_max_tokens.py covering default behavior, custom overrides, and wildcard routing safety.

github-actions bot and others added 2 commits February 27, 2026 10:50
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
@vercel
Copy link

vercel bot commented Feb 27, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
litellm Building Building Preview, Comment Feb 27, 2026 6:10pm

Request Review

@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
1 out of 2 committers have signed the CLA.

✅ Harshit28j
❌ github-actions[bot]
You have signed the CLA already but the status is still pending? Let us recheck it.

@Harshit28j
Copy link
Collaborator Author

@greptile can you please review this PR?

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Feb 27, 2026

Greptile Summary

Introduces optimization to reduce health check token consumption by defaulting max_tokens to 1 for non-wildcard models while maintaining a safe default of 10 for wildcard routes. Adds configurable health_check_max_tokens setting in model_info for user customization.

Key Changes:

  • Standard health checks now default to 1 max token, reducing cost and latency
  • Wildcard models (like openai/*) maintain 10 token default to avoid errors with models like gpt-5-nano
  • New health_check_max_tokens config option allows per-model customization
  • Logic correctly checks both health_check_model and original model for wildcard detection
  • Comprehensive test coverage with proper mocking (no real network calls)
  • Poetry.lock version bump is routine maintenance

Minor Issue:

  • Documentation for the new health_check_max_tokens configuration option should be added to the health.md file

Confidence Score: 5/5

  • Safe to merge - well-designed optimization with proper fallbacks and test coverage
  • Logic is sound with proper wildcard detection, tests are comprehensive and properly mocked, no breaking changes (user config can override defaults), only minor documentation gap
  • No files require special attention - all changes are straightforward

Important Files Changed

Filename Overview
litellm/proxy/health_check.py Adds configurable max_tokens with sensible defaults (1 for standard models, respects wildcards)
litellm/litellm_core_utils/health_check_helpers.py Updated to respect existing max_tokens from upstream, maintains 10 as fallback for wildcards
tests/test_litellm/proxy/test_health_check_max_tokens.py Comprehensive tests covering default behavior, custom overrides, and wildcard safety - properly mocked

Last reviewed commit: 2553698

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

4 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

Comment on lines +237 to +243
_health_check_max_tokens = model_info.get("health_check_max_tokens", None)
if _health_check_max_tokens is not None:
litellm_params["max_tokens"] = _health_check_max_tokens
elif "*" not in (
model_info.get("health_check_model") or litellm_params.get("model") or ""
):
litellm_params["max_tokens"] = 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add documentation for the new health_check_max_tokens config option in /docs/my-website/docs/proxy/health.md so users know they can configure this

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants