Skip to content

feat: add native Responses API support for hosted_vllm provider#22298

Merged
krrishdholakia merged 1 commit intoBerriAI:litellm_oss_staging_02_28_2026from
anencore94:feat/hosted-vllm-responses-api
Feb 28, 2026
Merged

feat: add native Responses API support for hosted_vllm provider#22298
krrishdholakia merged 1 commit intoBerriAI:litellm_oss_staging_02_28_2026from
anencore94:feat/hosted-vllm-responses-api

Conversation

@anencore94
Copy link

Register HostedVLLMResponsesAPIConfig so that litellm.responses(model="hosted_vllm/...") routes directly to vLLM's /v1/responses endpoint instead of falling back to the chat completions → responses conversion pipeline.

Relevant issues

Relates to #19733 (stalled since 2025-01-27; this PR incorporates maintainer feedback on API key defaults and generic approach)

Pre-Submission checklist

Please complete all items before asking a LiteLLM maintainer to review your PR

  • I have Added testing in the tests/litellm/ directory, Adding at least 1 test is a hard requirement - see details
  • My PR passes all unit tests on make test-unit
  • My PR's scope is as isolated as possible, it only solves 1 specific problem
  • I have requested a Greptile review by commenting @greptileai and received a Confidence Score of at least 4/5 before requesting a maintainer review

CI (LiteLLM team)

CI status guideline:

  • 50-55 passing tests: main is stable with minor issues.
  • 45-49 passing tests: acceptable but needs attention
  • <= 40 passing tests: unstable; be careful with your merges and assess the risk.
  • Branch creation CI run
    Link:

  • CI run for the last commit
    Link:

  • Merge / cherry-pick CI run
    Links:

Type

🆕 New Feature

Changes

  • New file litellm/llms/hosted_vllm/responses/transformation.pyHostedVLLMResponsesAPIConfig extending OpenAIResponsesAPIConfig with:
    • custom_llm_providerLlmProviders.HOSTED_VLLM
    • validate_environment() — defaults to "fake-api-key" when no key is provided (matching existing HostedVLLMChatConfig pattern)
    • get_complete_url() — resolves HOSTED_VLLM_API_BASE env var and handles api_base with/without /v1 suffix
  • litellm/__init__.py — add TYPE_CHECKING export
  • litellm/_lazy_imports_registry.py — register in lazy import system
  • litellm/utils.py — add HOSTED_VLLM case to get_provider_responses_api_config()
  • tests/test_litellm/llms/hosted_vllm/responses/test_hosted_vllm_responses.py — add 5 new tests:
    • test_hosted_vllm_provider_config_registration
    • test_hosted_vllm_responses_api_url
    • test_hosted_vllm_responses_api_url_requires_api_base
    • test_hosted_vllm_validate_environment_default_api_key
    • test_hosted_vllm_validate_environment_custom_api_key

Register HostedVLLMResponsesAPIConfig so that litellm.responses(model="hosted_vllm/...")
routes directly to vLLM's /v1/responses endpoint instead of falling back to the
chat completions → responses conversion pipeline.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@vercel
Copy link

vercel bot commented Feb 27, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
litellm Ready Ready Preview, Comment Feb 27, 2026 5:05pm

Request Review

@CLAassistant
Copy link

CLAassistant commented Feb 27, 2026

CLA assistant check
All committers have signed the CLA.

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Feb 27, 2026

Greptile Summary

Registers a new HostedVLLMResponsesAPIConfig so that litellm.responses(model="hosted_vllm/...") routes directly to vLLM's native /v1/responses endpoint, bypassing the chat-completions-to-responses conversion pipeline. The implementation closely follows the existing patterns used by other providers (GitHub Copilot, XAI, etc.) and mirrors the HostedVLLMChatConfig conventions for env var names and the "fake-api-key" default.

  • New config class in litellm/llms/hosted_vllm/responses/transformation.py extending OpenAIResponsesAPIConfig with vLLM-specific URL construction and API key defaults
  • Registration plumbing across __init__.py, _lazy_imports_registry.py, and utils.py — all follow established patterns
  • Test coverage with 7 tests (config registration, URL construction, env defaults, end-to-end mock). Two tests are fragile if HOSTED_VLLM_API_KEY or HOSTED_VLLM_API_BASE env vars are set in the runner environment

Confidence Score: 4/5

  • This PR is safe to merge after addressing the minor test fragility issues — the core implementation is clean and follows established patterns.
  • The implementation is minimal, well-structured, and follows existing patterns exactly. The only concerns are two tests that could be flaky in environments where HOSTED_VLLM_API_KEY or HOSTED_VLLM_API_BASE are set. The core transformation class and registration code are correct.
  • Pay attention to tests/test_litellm/llms/hosted_vllm/responses/test_hosted_vllm_responses.py — two tests need environment variable isolation to be robust in CI.

Important Files Changed

Filename Overview
litellm/llms/hosted_vllm/responses/transformation.py New HostedVLLMResponsesAPIConfig class extending OpenAIResponsesAPIConfig. Follows existing patterns from chat/embedding configs correctly — uses HOSTED_VLLM_API_BASE/KEY env vars, defaults to "fake-api-key", handles /v1 suffix in URL construction. Clean and well-structured.
litellm/init.py Adds TYPE_CHECKING import for HostedVLLMResponsesAPIConfig, placed logically next to existing hosted_vllm imports. Correct pattern.
litellm/_lazy_imports_registry.py Registers HostedVLLMResponsesAPIConfig in both the LLM_CONFIG_NAMES tuple and the _LLM_CONFIGS_IMPORT_MAP dict. Follows the established lazy import pattern correctly.
litellm/utils.py Adds HOSTED_VLLM case to get_provider_responses_api_config() dispatch, following the same pattern as all other providers. Placed correctly before the final return None.
tests/test_litellm/llms/hosted_vllm/responses/test_hosted_vllm_responses.py Good test coverage with 7 tests total. Two tests (default API key and missing api_base) are fragile — they don't mock or clear env vars, so they may fail if HOSTED_VLLM_API_KEY or HOSTED_VLLM_API_BASE are set in the test environment.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A["litellm.responses(model='hosted_vllm/...')"] --> B["ProviderConfigManager.get_provider_responses_api_config()"]
    B --> C{"provider == HOSTED_VLLM?"}
    C -->|Yes| D["HostedVLLMResponsesAPIConfig()"]
    C -->|No / None returned| E["Fallback: chat completions → responses conversion"]
    D --> F["validate_environment()"]
    F --> G["api_key from params / HOSTED_VLLM_API_KEY / 'fake-api-key'"]
    D --> H["get_complete_url()"]
    H --> I["api_base from params / HOSTED_VLLM_API_BASE"]
    I --> J["Append /v1/responses or /responses"]
    G --> K["Direct POST to vLLM /v1/responses"]
    J --> K
    K --> L["OpenAI-compatible Response parsed by base class"]
Loading

Last reviewed commit: 7240266

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

5 files reviewed, 2 comments

Edit Code Review Agent Settings | Greptile

Comment on lines +158 to +166
def test_hosted_vllm_responses_api_url_requires_api_base():
"""Test get_complete_url() raises ValueError when api_base is not set."""
config = HostedVLLMResponsesAPIConfig()

with pytest.raises(ValueError, match="api_base not set"):
config.get_complete_url(
api_base=None,
litellm_params={},
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test is fragile when env var is set

test_hosted_vllm_responses_api_url_requires_api_base will not raise ValueError if the HOSTED_VLLM_API_BASE environment variable is set in the test runner's environment (e.g., CI), because get_secret_str("HOSTED_VLLM_API_BASE") will return a value before the None check. Consider patching get_secret_str to return None, or using monkeypatch.delenv to ensure the env var is unset:

Suggested change
def test_hosted_vllm_responses_api_url_requires_api_base():
"""Test get_complete_url() raises ValueError when api_base is not set."""
config = HostedVLLMResponsesAPIConfig()
with pytest.raises(ValueError, match="api_base not set"):
config.get_complete_url(
api_base=None,
litellm_params={},
)
def test_hosted_vllm_responses_api_url_requires_api_base(monkeypatch):
"""Test get_complete_url() raises ValueError when api_base is not set."""
monkeypatch.delenv("HOSTED_VLLM_API_BASE", raising=False)
config = HostedVLLMResponsesAPIConfig()
with pytest.raises(ValueError, match="api_base not set"):
config.get_complete_url(
api_base=None,
litellm_params={},
)

Comment on lines +169 to +179
def test_hosted_vllm_validate_environment_default_api_key():
"""Test validate_environment() defaults to 'fake-api-key' when no key is provided."""
config = HostedVLLMResponsesAPIConfig()

headers = config.validate_environment(
headers={},
model="Qwen/Qwen3-8B",
litellm_params=GenericLiteLLMParams(),
)

assert headers.get("Authorization") == "Bearer fake-api-key"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test is fragile when env var is set

test_hosted_vllm_validate_environment_default_api_key will fail if HOSTED_VLLM_API_KEY is set in the test environment, because get_secret_str("HOSTED_VLLM_API_KEY") will return a real value instead of falling through to "fake-api-key". Consider clearing the env var:

Suggested change
def test_hosted_vllm_validate_environment_default_api_key():
"""Test validate_environment() defaults to 'fake-api-key' when no key is provided."""
config = HostedVLLMResponsesAPIConfig()
headers = config.validate_environment(
headers={},
model="Qwen/Qwen3-8B",
litellm_params=GenericLiteLLMParams(),
)
assert headers.get("Authorization") == "Bearer fake-api-key"
def test_hosted_vllm_validate_environment_default_api_key(monkeypatch):
"""Test validate_environment() defaults to 'fake-api-key' when no key is provided."""
monkeypatch.delenv("HOSTED_VLLM_API_KEY", raising=False)
config = HostedVLLMResponsesAPIConfig()
headers = config.validate_environment(
headers={},
model="Qwen/Qwen3-8B",
litellm_params=GenericLiteLLMParams(),
)
assert headers.get("Authorization") == "Bearer fake-api-key"

@krrishdholakia krrishdholakia changed the base branch from main to litellm_oss_staging_02_28_2026 February 28, 2026 03:39
@krrishdholakia krrishdholakia merged commit 1b4cfc2 into BerriAI:litellm_oss_staging_02_28_2026 Feb 28, 2026
26 of 30 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants