Skip to content

Support embedding models with vLLM backend#3016

Open
yuancu wants to merge 14 commits intodeepjavalibrary:masterfrom
yuancu:feat/vllm-embedding-support
Open

Support embedding models with vLLM backend#3016
yuancu wants to merge 14 commits intodeepjavalibrary:masterfrom
yuancu:feat/vllm-embedding-support

Conversation

@yuancu
Copy link
Copy Markdown

@yuancu yuancu commented Apr 21, 2026

Description

This PR adds support of embedding models (e.g. intfloat/e5-small) to vLLM backend through the vLLM handler

Resolves #3015

Type of change

  • New feature (non-breaking change which adds functionality)
  • This change requires a documentation update

Checklist:

Feature/Issue validation/testing

  • Unit tests (27 tests in test_vllm_embedding.py, 1 in test_properties_manager.py)

    • Embedding output formatter: single/batch responses, error handling (non-200 status, missing data field), high-dimensional vectors
    • Task-to-runner/convert mapping: all task values (embed, feature-extraction, generate, classify, reward, pooling, auto, unknown)
    • Runner/convert values propagated to engine args; passthrough override behavior verified
    • Request preprocessing: single/batch text, normalize flag propagation to use_activation, model name handling, empty/missing inputs, invalid type rejection
    • Output contract: DJL flat-array shape, content type validation, batch size matching
  • Integration tests (e5-small-vllm, bge-base-vllm)

    • Configs added to prepare.py and client.py with batch sizes [1, 8]
    • Requires Integration Tests Executor run to validate

yuancu added 10 commits April 20, 2026 03:20
EmbeddingCompletionRequest requires request_id field (no default).
- Remove unreachable text-generation branch in _map_task_to_runner_convert
- Use self.vllm_properties.task instead of raw properties dict
- Handle error HTTP status codes in embedding_output_formatter
Unit tests cover: output formatter (single/batch/error/dict/high-dim),
task-to-runner/convert mapping (all task values), engine arg dict generation,
preprocess_request embedding intercept, embedding detection logic,
end-to-end inference flow, and DJL contract compliance.

Integration test configs add e5-small-vllm and bge-base-vllm models.
Use option.normalize in serving.properties (default true) to control
whether embedding results are L2-normalized before returning to clients.
The value is passed through VllmRbProperties and applied in the output
formatter via functools.partial.
- Add input validation for non-string/non-list inputs in embedding preprocessing
- Add comment explaining normalize → use_activation mapping (vllm 0.19.x)
- Make embedding_output_formatter signature explicit (request, tokenizer kwargs)
- Rename RUNNER_VALUES/CONVERT_VALUES to lowercase (local variables, not constants)
- Remove trivial TestEmbeddingDetection tests and stale formatter error tests
…ues only

Remove dead options (reward, mm_encoder_only, pooling, draft, none) that
vLLM does not accept as convert/runner args. Replace branching logic with
a dict lookup of the 5 values users actually set.
@yuancu yuancu force-pushed the feat/vllm-embedding-support branch from f8d9f23 to 74a3074 Compare April 21, 2026 04:35
yuancu added 3 commits April 21, 2026 04:43
…ion warnings

- Handle non-200 responses and missing "data" field in
  embedding_output_formatter to avoid confusing KeyError crashes
- Warn when passthrough engine args override computed runner/convert values
- Replace deprecated asyncio.get_event_loop() with asyncio.run() in tests
Use text_embedding (and text-embedding alias) uniformly as the DJL
embedding task for vLLM, consistent with other engines. Remove embed
as a standalone task; text_embedding maps to vLLM's convert=embed
internally. Update user guide and integration test configs accordingly.
@yuancu yuancu force-pushed the feat/vllm-embedding-support branch from 20a9ff3 to 6661b85 Compare April 21, 2026 05:40
@yuancu yuancu marked this pull request as ready for review April 21, 2026 05:41
@xyang16
Copy link
Copy Markdown
Contributor

xyang16 commented Apr 21, 2026

@yuancu Thanks for contributing! Could you run ./gradlew formatPython to format code? Thanks.

@yuancu
Copy link
Copy Markdown
Author

yuancu commented Apr 21, 2026

@xyang16 Thanks for pointing out! Fixed :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support embedding tasks with vLLM backend

2 participants