Conversation
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
|
|
|
@greptile can you please review this PR? |
Greptile SummaryIntroduces optimization to reduce health check token consumption by defaulting max_tokens to 1 for non-wildcard models while maintaining a safe default of 10 for wildcard routes. Adds configurable health_check_max_tokens setting in model_info for user customization. Key Changes:
Minor Issue:
Confidence Score: 5/5
|
| Filename | Overview |
|---|---|
| litellm/proxy/health_check.py | Adds configurable max_tokens with sensible defaults (1 for standard models, respects wildcards) |
| litellm/litellm_core_utils/health_check_helpers.py | Updated to respect existing max_tokens from upstream, maintains 10 as fallback for wildcards |
| tests/test_litellm/proxy/test_health_check_max_tokens.py | Comprehensive tests covering default behavior, custom overrides, and wildcard safety - properly mocked |
Last reviewed commit: 2553698
| _health_check_max_tokens = model_info.get("health_check_max_tokens", None) | ||
| if _health_check_max_tokens is not None: | ||
| litellm_params["max_tokens"] = _health_check_max_tokens | ||
| elif "*" not in ( | ||
| model_info.get("health_check_model") or litellm_params.get("model") or "" | ||
| ): | ||
| litellm_params["max_tokens"] = 1 |
There was a problem hiding this comment.
add documentation for the new health_check_max_tokens config option in /docs/my-website/docs/proxy/health.md so users know they can configure this
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
Relevant issues
Address health check token overconsumption by introducing a configurable limit and a sensible default for non-wildcard models.
Pre-Submission checklist
tests/litellm/directory (Added tests/test_litellm/proxy/test_health_check_max_tokens.py)make test-unit@greptileaiCI (LiteLLM team)
Link:
Link:
Links:
Type
🆕 New Feature
Changes
max_tokens: 1for standard health checks. This prevents health checks (like Azure OpenAI) from generating long responses, saving cost and reducing latency.health_check_max_tokensinmodel_info. Users can now explicitly set the token limit for health checks in theirconfig.yaml.10for wildcard-route models.