-
-
Notifications
You must be signed in to change notification settings - Fork 6k
Open
Labels
Description
Feature Request
1. litellm.count_tokens() — Public SDK method for provider token counting APIs
Currently, LiteLLM has:
litellm.token_counter()— local tokenizer (tiktoken), approximatelitellm.encode()/litellm.decode()— local tokenization- Provider-specific classes (
AnthropicTokenCounter,BedrockTokenCounter, etc.) — internal, not user-facing
Missing: A simple public method that calls the provider's real token counting API, like litellm.completion() does for chat.
Proposed:
# Same simplicity as litellm.completion()
result = litellm.count_tokens(
model="bedrock/anthropic.claude-3-5-sonnet",
messages=[{"role": "user", "content": "Hello"}],
system="You are a helpful assistant.",
tools=[{"name": "read_file", "description": "...", "input_schema": {...}}],
)
# → {"input_tokens": 531}This would:
- Auto-detect the provider from the model string
- Call the correct provider API (Anthropic, Bedrock, Gemini, Vertex AI, OpenAI)
- Fall back to local tokenizer if provider doesn't have an API
- Return the exact token count the model will consume
2. Add OpenAI POST /v1/responses/input_tokens support
OpenAI now has a token counting API endpoint. LiteLLM should support it as a provider in the token counting flow.
Provider support status:
| Provider | Has count_tokens API | LiteLLM support |
|---|---|---|
| Anthropic | ✅ /v1/messages/count_tokens |
✅ Supported |
| Bedrock (Claude) | ✅ /model/{id}/count-tokens |
✅ Supported |
| Vertex AI (Claude) | ✅ countTokens |
✅ Supported |
| Gemini | ✅ /models/{model}:countTokens |
✅ Supported |
| Azure AI (Claude) | ✅ /anthropic/v1/messages/count_tokens |
✅ Supported |
| OpenAI | ✅ /v1/responses/input_tokens |
❌ Not supported |
Motivation
Clients like Claude Code need exact token counts to manage context windows. Local tokenizers (tiktoken) are approximate and don't account for how models internally process tools, images, system prompts, etc. The provider APIs return the exact count.
Having a simple litellm.count_tokens() would make this accessible to SDK users, not just proxy users.
Reactions are currently unavailable