Support embedding models with vLLM backend by yuancu · Pull Request #3016 · deepjavalibrary/djl-serving

yuancu · 2026-04-21T03:44:03Z

Description

This PR adds support of embedding models (e.g. intfloat/e5-small) to vLLM backend through the vLLM handler

Resolves #3015

Type of change

New feature (non-breaking change which adds functionality)
This change requires a documentation update

Checklist:

Please add the link of Integration Tests Executor run with related tests.
Have you manually built the docker image and verify the change?
Have you run related tests? Check how to set up the test environment here; One example would be pytest tests.py -k "TestVllm1" -m "vllm"
Have you added tests that prove your fix is effective or that this feature works?
Has code been commented, particularly in hard-to-understand areas?
Have you made corresponding changes to the documentation?

Feature/Issue validation/testing

Unit tests (27 tests in test_vllm_embedding.py, 1 in test_properties_manager.py)
- Embedding output formatter: single/batch responses, error handling (non-200 status, missing data field), high-dimensional vectors
- Task-to-runner/convert mapping: all task values (embed, feature-extraction, generate, classify, reward, pooling, auto, unknown)
- Runner/convert values propagated to engine args; passthrough override behavior verified
- Request preprocessing: single/batch text, normalize flag propagation to use_activation, model name handling, empty/missing inputs, invalid type rejection
- Output contract: DJL flat-array shape, content type validation, batch size matching
Integration tests (e5-small-vllm, bge-base-vllm)
- Configs added to prepare.py and client.py with batch sizes [1, 8]
- Requires Integration Tests Executor run to validate

…ping

EmbeddingCompletionRequest requires request_id field (no default).

- Remove unreachable text-generation branch in _map_task_to_runner_convert - Use self.vllm_properties.task instead of raw properties dict - Handle error HTTP status codes in embedding_output_formatter

Unit tests cover: output formatter (single/batch/error/dict/high-dim), task-to-runner/convert mapping (all task values), engine arg dict generation, preprocess_request embedding intercept, embedding detection logic, end-to-end inference flow, and DJL contract compliance. Integration test configs add e5-small-vllm and bge-base-vllm models.

Use option.normalize in serving.properties (default true) to control whether embedding results are L2-normalized before returning to clients. The value is passed through VllmRbProperties and applied in the output formatter via functools.partial.

- Add input validation for non-string/non-list inputs in embedding preprocessing - Add comment explaining normalize → use_activation mapping (vllm 0.19.x) - Make embedding_output_formatter signature explicit (request, tokenizer kwargs) - Rename RUNNER_VALUES/CONVERT_VALUES to lowercase (local variables, not constants) - Remove trivial TestEmbeddingDetection tests and stale formatter error tests

…ues only Remove dead options (reward, mm_encoder_only, pooling, draft, none) that vLLM does not accept as convert/runner args. Replace branching logic with a dict lookup of the 5 values users actually set.

…ion warnings - Handle non-200 responses and missing "data" field in embedding_output_formatter to avoid confusing KeyError crashes - Warn when passthrough engine args override computed runner/convert values - Replace deprecated asyncio.get_event_loop() with asyncio.run() in tests

Use text_embedding (and text-embedding alias) uniformly as the DJL embedding task for vLLM, consistent with other engines. Remove embed as a standalone task; text_embedding maps to vLLM's convert=embed internally. Update user guide and integration test configs accordingly.

xyang16 · 2026-04-21T17:47:47Z

@yuancu Thanks for contributing! Could you run ./gradlew formatPython to format code? Thanks.

yuancu · 2026-04-21T19:44:01Z

@xyang16 Thanks for pointing out! Fixed :)

yuancu added 10 commits April 20, 2026 03:20

feat(vllm): add embedding output formatter with DJL flat-array unwrap…

468e6d3

…ping

feat(vllm): map DJL task property to vLLM runner/convert engine args

7fb1e87

feat(vllm): add embedding task support via ServingEmbedding

2a91911

fix(vllm): add request_id to EmbeddingCompletionRequest and uuid import

071f3cc

EmbeddingCompletionRequest requires request_id field (no default).

fix(vllm): address code review findings

7c03ea8

- Remove unreachable text-generation branch in _map_task_to_runner_convert - Use self.vllm_properties.task instead of raw properties dict - Handle error HTTP status codes in embedding_output_formatter

update embedding user guide with vLLM support

e3987a5

refactor(vllm): simplify task-to-engine-arg mapping to valid vLLM val…

74a3074

…ues only Remove dead options (reward, mm_encoder_only, pooling, draft, none) that vLLM does not accept as convert/runner args. Replace branching logic with a dict lookup of the 5 values users actually set.

yuancu force-pushed the feat/vllm-embedding-support branch from f8d9f23 to 74a3074 Compare April 21, 2026 04:35

yuancu added 3 commits April 21, 2026 04:43

add integratation test entries for e5 and bge on vllm

7ef02e2

yuancu force-pushed the feat/vllm-embedding-support branch from 20a9ff3 to 6661b85 Compare April 21, 2026 05:40

yuancu marked this pull request as ready for review April 21, 2026 05:41

chore: format python code

b3557ea

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support embedding models with vLLM backend#3016

Support embedding models with vLLM backend#3016
yuancu wants to merge 14 commits intodeepjavalibrary:masterfrom
yuancu:feat/vllm-embedding-support

yuancu commented Apr 21, 2026 •

edited

Loading

Uh oh!

xyang16 commented Apr 21, 2026

Uh oh!

yuancu commented Apr 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

yuancu commented Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of change

Checklist:

Feature/Issue validation/testing

Uh oh!

xyang16 commented Apr 21, 2026

Uh oh!

yuancu commented Apr 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

yuancu commented Apr 21, 2026 •

edited

Loading