Skip to content

LiteLLM: patch_litellm() does not patch embedding or moderation; aembedding missing entirely #115

@braintrust-bot

Description

@braintrust-bot

Gap

patch_litellm() — the function used by auto_instrument() — only replaces 4 of the 6 functions that LiteLLMWrapper instruments. Embedding and moderation calls made via litellm.embedding() or litellm.moderation() are silently untraced when using auto-instrumentation.

Additionally, LiteLLMWrapper itself is missing aembedding() (async embeddings), creating an asymmetry with the rest of the wrapper which provides both sync and async variants for completion and responses.

patch_litellm() (line 630–667) patches:

Function Patched?
litellm.completion Yes
litellm.acompletion Yes
litellm.responses Yes
litellm.aresponses Yes
litellm.embedding No
litellm.aembedding No (also missing from LiteLLMWrapper)
litellm.moderation No
litellm.amoderation No (also missing from LiteLLMWrapper)

LiteLLMWrapper (line 484–518) exposes:

  • completion() / acompletion() — sync + async
  • responses() / aresponses() — sync + async
  • embedding()sync only, no async
  • moderation()sync only, no async

What is missing

  1. patch_litellm() should also patch litellm.embedding and litellm.moderation so auto-instrument users get tracing on these calls.
  2. LiteLLMWrapper should add aembedding() and amoderation() for async parity, and patch_litellm() should patch those as well.

Embeddings are a core AI API surface used in RAG pipelines and semantic search. The EmbeddingWrapper class (line 423) already exists and works correctly — it's just not wired into patch_litellm().

Braintrust docs status

not_found — no LiteLLM-specific integration page was found at braintrust.dev/docs/integrations/ai-providers/litellm (404).

Upstream sources

Local files inspected

  • py/src/braintrust/wrappers/litellm.py:
    • LiteLLMWrapper.__init__ (line 484–494) — creates _embedding_wrapper and _moderation_wrapper but no async variants
    • LiteLLMWrapper class (line 496–518) — exposes embedding() and moderation() sync only
    • patch_litellm() (line 630–667) — only patches completion, acompletion, responses, aresponses
    • EmbeddingWrapper class (line 423) — fully implemented with metrics extraction
    • TODO at line 446: "Add a flag to control whether to log the full embedding vector"
  • py/src/braintrust/wrappers/test_litellm.py — has embedding tests but only via wrap_litellm(), not via patch_litellm()
  • Grep for aembedding in litellm.py: zero results

Metadata

Metadata

Assignees

No one assigned

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions