Skip to content

videos#18

Open
sergei-bronnikov wants to merge 6 commits intomainfrom
17481_audio_video
Open

videos#18
sergei-bronnikov wants to merge 6 commits intomainfrom
17481_audio_video

Conversation

@sergei-bronnikov
Copy link
Copy Markdown

@sergei-bronnikov sergei-bronnikov commented Mar 16, 2026

https://bugtracker.codiodev.com/issue/codio-17481/Add-BricksLLM-support-for-OpenAI-Text-to-Speech-and-Speech-to-Text-model-allowlist-audio-routing

Summary by CodeRabbit

  • New Features
    • Added support for new audio transcription models (gpt-4o-transcribe and gpt-4o-mini variants).
    • Introduced video generation and processing endpoints with full proxy support.
    • Enhanced transcription cost calculation using token-based usage tracking.
    • Added video cost estimation based on model and resolution.

destitutus
destitutus previously approved these changes Mar 18, 2026
@sergei-bronnikov
Copy link
Copy Markdown
Author

@coderabbitai full review

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 10, 2026

✅ Actions performed

Full review triggered.

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 10, 2026

Walkthrough

This PR extends OpenAI provider support by adding cost estimation for audio transcription/translation and video processing, introducing corresponding type definitions, implementing proxy handlers to route new audio models to specialized processors, and registering new video proxy routes with cost estimation.

Changes

Cohort / File(s) Summary
Audio & Video Cost Estimation
internal/provider/openai/cost.go, internal/provider/openai/types.go
Added audio price entries for gpt-4o-transcribe, gpt-4o-transcribe-diarize, gpt-4o-mini-transcribe, and gpt-4o-mini-tts; introduced transcription-input and transcription-output cost maps; added video cost map keyed by normalized resolution (720/1024/1080). New types added for VideoResponseMetadata, TranscriptionResponse, TranscriptionStreamChunk, and helper methods for parsing metadata and chunk classification.
Transcription/Translation Proxy
internal/server/web/proxy/audio.go, internal/server/web/proxy/audio_extended.go
Updated getTranscriptionsHandler and getTranslationsHandler to branch on model name and delegate gpt-4o-transcribe* and gpt-4o-mini-transcribe models to new processors. Implemented processGPTTranscriptions, processGPTTranslations, and shared processGPTAudio handler with support for non-streaming JSON/text responses and streaming SSE responses; cost is estimated from token usage when available.
Video Proxy Handler
internal/server/web/proxy/video.go
Implemented getVideoHandler to proxy video requests to OpenAI, handling cost estimation for POST requests via EstimateVideoCost, forwarding response headers, and handling both success and error responses with telemetry recording.
Proxy Interface & Routing
internal/server/web/proxy/middleware.go, internal/server/web/proxy/proxy.go
Updated estimator interface to add usage parameter to EstimateTranscriptionCost and new EstimateVideoCost method. Registered new HTTP routes for video collection and resource endpoints (/api/providers/openai/v1/videos and variants) to getVideoHandler.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant Handler as Audio Handler
    participant Processor as GPT Audio<br/>Processor
    participant OpenAI as OpenAI API
    participant Estimator

    Client->>Handler: POST /audio/transcriptions<br/>(model: gpt-4o-transcribe)
    Handler->>Handler: Extract model from form
    Handler->>Handler: Route check for gpt-4o-*
    Handler->>Processor: processGPTTranscriptions(...)
    Processor->>Processor: Validate request & context
    Processor->>Processor: Build http.Request<br/>(multipart form data)
    Processor->>Processor: Detect streaming mode
    Processor->>Processor: Modify request<br/>(response_format handling)
    Processor->>OpenAI: Execute request
    OpenAI-->>Processor: Non-streaming: 200 OK<br/>TranscriptionResponse
    Processor->>Processor: Unmarshal response
    Processor->>Estimator: EstimateTranscriptionCost<br/>(secs, model, usage)
    Estimator-->>Processor: costInUsd
    Processor->>Processor: Store costInUsd in context
    Processor-->>Client: JSON or text response
    
    Note over Processor,OpenAI: Streaming path:
    OpenAI-->>Processor: newline-delimited chunks
    loop For each SSE chunk
        Processor->>Processor: Unmarshal TranscriptionStreamChunk
        Processor->>Processor: Extract delta/text
        Processor->>Processor: Check if IsDone()
        alt Chunk is done
            Processor->>Estimator: EstimateTranscriptionCost<br/>(accumulated usage)
            Estimator-->>Processor: final costInUsd
        end
        Processor-->>Client: SSE event
    end
    Processor-->>Client: SSE [DONE]
Loading
sequenceDiagram
    participant Client
    participant Handler as Video Handler
    participant Validator as URL Builder
    participant OpenAI as OpenAI API
    participant Estimator

    Client->>Handler: POST/GET/DELETE<br/>/api/providers/openai/v1/videos
    Handler->>Handler: Validate request & context
    Handler->>Validator: constructVideoURL(path)
    Validator-->>Handler: https://api.openai.com/...
    Handler->>Handler: Create http.Request<br/>(copy method, body, headers)
    Handler->>OpenAI: Execute request
    alt Success (200)
        OpenAI-->>Handler: VideoResponseMetadata
        Handler->>Handler: Unmarshal response
        alt POST request (paid)
            Handler->>Estimator: EstimateVideoCost<br/>(metadata)
            Estimator-->>Handler: costInUsd
            Handler->>Handler: Store costInUsd in context
        end
        Handler-->>Client: Status 200 + response body
    else Error (non-200)
        OpenAI-->>Handler: Error response
        Handler->>Handler: Unmarshal ErrorResponse
        Handler->>Handler: Log error details
        Handler-->>Client: Original status + error body
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

  • 17073 gpt 5 support #9: Modifies OpenAI provider cost estimation and types alongside the estimator interface in middleware.go, enabling token-based cost calculation.
  • images/ #16: Extends OpenAI provider type definitions and OpenAiPerThousandTokenCost maps for additional media processing features (video in this PR, images in the referenced PR).

Suggested reviewers

  • destitutus
  • AndreyNikitin
🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check ❓ Inconclusive The title 'videos' is vague and generic, using a single non-descriptive term that does not convey meaningful information about the changeset's primary objectives. Consider a more descriptive title that captures the main changes, such as 'Add OpenAI videos, transcription, and translation proxy endpoints' or 'Support GPT-4o audio/video models and streaming transcriptions'.
✅ Passed checks (1 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch 17481_audio_video

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🧹 Nitpick comments (1)
internal/server/web/proxy/audio.go (1)

172-176: Extract the GPT audio-model check into one helper.

The same hard-coded model list now drives branching in both handlers. A shared predicate keeps transcription/translation routing from drifting when this allowlist changes again.

Also applies to: 342-346

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/server/web/proxy/audio.go` around lines 172 - 176, Extract the
hard-coded allowlist into a single helper predicate (e.g.,
isGPTTranscriptionModel(model string) bool) that returns true for
"gpt-4o-transcribe", "gpt-4o-transcribe-diarize", and "gpt-4o-mini-transcribe";
then replace the inline checks in the handler around the model variable and the
other duplicated branch (the block that currently calls
processGPTTranscriptions(c, prod, client, e, model) and the similar block at the
later location) to call this helper instead so both routing points share the
same source of truth.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@internal/provider/openai/cost.go`:
- Around line 820-838: EstimateVideoCost currently mandates a model-size lookup
and errors when metadata.Size is absent, but the cost map may contain fallback
keys like "sora-2" (model-only). Change EstimateVideoCost to try lookups in
order: 1) if size is present/normalized, try "model-size"; 2) if that fails (or
size missing/normalization returns an empty/expected-error), try the model-only
key "model"; and only return an error if neither key exists in
ce.tokenCostMap["video"]. Handle normalization errors by treating missing size
as absent (do not immediately return), and update the same lookup logic in the
analogous image pricing function (the one around lines 841-852) so both video
and image cost resolution use the model-then-model-only fallback.

In `@internal/provider/openai/types.go`:
- Around line 101-106: The current GetSecondsAsFloat silently returns 0 on parse
failure which causes EstimateVideoCost to under-bill; change GetSecondsAsFloat
to return (float64, error) (or add a new GetSecondsAsFloatSafe that returns
(float64, error)) and propagate/handle the error in callers like
EstimateVideoCost and any other call sites, validating v.Seconds before using it
and returning/propagating the parse error instead of treating malformed or
missing seconds as 0 so billing is correct.

In `@internal/server/web/proxy/audio_extended.go`:
- Around line 65-67: The call to modifyGPTTranscriptionsRequest currently
swallows multipart-rewrite failures by writing responses internally and
returning void, causing processGPTAudio to continue and later call
client.Do(req) which can double-write the response; change
modifyGPTTranscriptionsRequest to return an error (or a bool + error) and in
processGPTAudio (where modifyGPTTranscriptionsRequest(ginCtx, prod, log, req,
handler) is invoked) check that return value and immediately return from
processGPTAudio if an error/non-ok is returned (so you don't proceed to
client.Do(req)); update all other call sites in the same file (including the
block around lines 235-286) to handle the new return and propagate or log the
error appropriately.
- Around line 57-67: The streaming branch reads form values (ginCtx.PostForm)
which drains multipart request bodies, so when isStreaming is true the audio
payload can be lost because modifyGPTTranscriptionsRequest (which rebuilds the
multipart body) is skipped; ensure the multipart body is reconstructed after any
form parsing regardless of isStreaming by calling or inlining the same
body-rebuild logic used in modifyGPTTranscriptionsRequest (or factoring that
logic into a helper) before proxying in the streaming path so req.Body contains
the full multipart payload for upstream.

In `@internal/server/web/proxy/proxy.go`:
- Around line 107-114: The video routes call getVideoHandler but never set the
request's model value before the later isModelAllowed / isModelSupported checks,
letting empty model pass; update the routing/middleware so the model is
extracted and set on the request context for all video endpoints (e.g., in the
same middleware that other routes use) by reading the model from the incoming
payload/form or defaulting to the intended video model, ensuring getVideoHandler
sees a non-empty model and that isModelAllowed / isModelSupported are enforced
for routes such as the POST/GET/DELETE handlers for
"/api/providers/openai/v1/videos" and its subpaths.

---

Nitpick comments:
In `@internal/server/web/proxy/audio.go`:
- Around line 172-176: Extract the hard-coded allowlist into a single helper
predicate (e.g., isGPTTranscriptionModel(model string) bool) that returns true
for "gpt-4o-transcribe", "gpt-4o-transcribe-diarize", and
"gpt-4o-mini-transcribe"; then replace the inline checks in the handler around
the model variable and the other duplicated branch (the block that currently
calls processGPTTranscriptions(c, prod, client, e, model) and the similar block
at the later location) to call this helper instead so both routing points share
the same source of truth.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 7dce883e-09e8-40eb-ac83-14fcdb5c7c91

📥 Commits

Reviewing files that changed from the base of the PR and between ecbe3e2 and 9289ffe.

📒 Files selected for processing (7)
  • internal/provider/openai/cost.go
  • internal/provider/openai/types.go
  • internal/server/web/proxy/audio.go
  • internal/server/web/proxy/audio_extended.go
  • internal/server/web/proxy/middleware.go
  • internal/server/web/proxy/proxy.go
  • internal/server/web/proxy/video.go

Comment on lines +820 to +838
func (ce *CostEstimator) EstimateVideoCost(metadata *VideoResponseMetadata) (float64, error) {
if metadata == nil {
return 0, errors.New("metadata is nil")
}
costMap, ok := ce.tokenCostMap["video"]
if !ok {
return 0, errors.New("video cost map is not provided")
}
model := metadata.Model
size, err := normalizedVideoSize(metadata.Size)
if err != nil {
return 0, err
}
costKey := fmt.Sprintf("%s-%s", model, size)
cost, ok := costMap[costKey]
if !ok {
return 0, errors.New("model with provided size is not present in the video cost map")
}
return cost * metadata.GetSecondsAsFloat(), nil
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Support size-less video pricing before forcing model-size lookup.

The new cost map contains plain keys like sora-2 and sora-2-pro, but this implementation always requires metadata.Size and always looks up model-size. Any response without a size will now error and record $0, even though you already have a fallback price configured.

💡 Suggested fix
 func (ce *CostEstimator) EstimateVideoCost(metadata *VideoResponseMetadata) (float64, error) {
 	if metadata == nil {
 		return 0, errors.New("metadata is nil")
 	}
 	costMap, ok := ce.tokenCostMap["video"]
 	if !ok {
 		return 0, errors.New("video cost map is not provided")
 	}
 	model := metadata.Model
-	size, err := normalizedVideoSize(metadata.Size)
-	if err != nil {
-		return 0, err
-	}
-	costKey := fmt.Sprintf("%s-%s", model, size)
-	cost, ok := costMap[costKey]
+	costKey := model
+	if metadata.Size != "" {
+		size, err := normalizedVideoSize(metadata.Size)
+		if err != nil {
+			return 0, err
+		}
+		costKey = fmt.Sprintf("%s-%s", model, size)
+	}
+	cost, ok := costMap[costKey]
 	if !ok {
 		return 0, errors.New("model with provided size is not present in the video cost map")
 	}
 	return cost * metadata.GetSecondsAsFloat(), nil
 }

Also applies to: 841-852

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/provider/openai/cost.go` around lines 820 - 838, EstimateVideoCost
currently mandates a model-size lookup and errors when metadata.Size is absent,
but the cost map may contain fallback keys like "sora-2" (model-only). Change
EstimateVideoCost to try lookups in order: 1) if size is present/normalized, try
"model-size"; 2) if that fails (or size missing/normalization returns an
empty/expected-error), try the model-only key "model"; and only return an error
if neither key exists in ce.tokenCostMap["video"]. Handle normalization errors
by treating missing size as absent (do not immediately return), and update the
same lookup logic in the analogous image pricing function (the one around lines
841-852) so both video and image cost resolution use the model-then-model-only
fallback.

Comment on lines +101 to +106
func (v *VideoResponseMetadata) GetSecondsAsFloat() float64 {
if secondsFloat, err := strconv.ParseFloat(v.Seconds, 64); err == nil {
return secondsFloat
}
return 0
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Don't silently coerce invalid video duration to $0.

If seconds is missing or malformed, this returns 0 and EstimateVideoCost under-bills without any error. Please return an error here, or make EstimateVideoCost validate the raw field before multiplying.

💡 Suggested direction
-func (v *VideoResponseMetadata) GetSecondsAsFloat() float64 {
-	if secondsFloat, err := strconv.ParseFloat(v.Seconds, 64); err == nil {
-		return secondsFloat
-	}
-	return 0
+func (v *VideoResponseMetadata) GetSecondsAsFloat() (float64, error) {
+	return strconv.ParseFloat(v.Seconds, 64)
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/provider/openai/types.go` around lines 101 - 106, The current
GetSecondsAsFloat silently returns 0 on parse failure which causes
EstimateVideoCost to under-bill; change GetSecondsAsFloat to return (float64,
error) (or add a new GetSecondsAsFloatSafe that returns (float64, error)) and
propagate/handle the error in callers like EstimateVideoCost and any other call
sites, validating v.Seconds before using it and returning/propagating the parse
error instead of treating malformed or missing seconds as 0 so billing is
correct.

Comment on lines +57 to +67
isStreaming := ginCtx.PostForm("stream") == "True" || ginCtx.PostForm("stream") == "true"

if isStreaming {
req.Header.Set("Accept", "*/*")
req.Header.Set("Cache-Control", "no-cache")
req.Header.Set("Connection", "keep-alive")
}

if !isStreaming {
modifyGPTTranscriptionsRequest(ginCtx, prod, log, req, handler)
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Streaming GPT audio requests can be proxied with an empty body.

By the time this branch runs, the request form has already been parsed (PostForm("model") in the caller, and PostForm("stream") here). For multipart uploads that consumes ginCtx.Request.Body. Because the streaming path skips modifyGPTTranscriptionsRequest, upstream can receive EOF instead of the audio payload.

💡 Suggested direction
-	isStreaming := ginCtx.PostForm("stream") == "True" || ginCtx.PostForm("stream") == "true"
+	isStreaming := ginCtx.PostForm("stream") == "True" || ginCtx.PostForm("stream") == "true"

-	if !isStreaming {
-		modifyGPTTranscriptionsRequest(ginCtx, prod, log, req, handler)
-	}
+	if err := modifyGPTTranscriptionsRequest(ginCtx, prod, log, req, handler, isStreaming); err != nil {
+		return
+	}

Rebuild the multipart body after any form parsing, regardless of streaming mode.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/server/web/proxy/audio_extended.go` around lines 57 - 67, The
streaming branch reads form values (ginCtx.PostForm) which drains multipart
request bodies, so when isStreaming is true the audio payload can be lost
because modifyGPTTranscriptionsRequest (which rebuilds the multipart body) is
skipped; ensure the multipart body is reconstructed after any form parsing
regardless of isStreaming by calling or inlining the same body-rebuild logic
used in modifyGPTTranscriptionsRequest (or factoring that logic into a helper)
before proxying in the streaming path so req.Body contains the full multipart
payload for upstream.

Comment on lines +65 to +67
if !isStreaming {
modifyGPTTranscriptionsRequest(ginCtx, prod, log, req, handler)
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Propagate multipart-rewrite failures back to the caller.

modifyGPTTranscriptionsRequest writes error responses internally, but it returns void, so processGPTAudio still continues to client.Do(req). That can double-write the response and mask the actual failure.

💡 Suggested fix
-func modifyGPTTranscriptionsRequest(c *gin.Context, prod bool, log *zap.Logger, req *http.Request, handler string) {
+func modifyGPTTranscriptionsRequest(c *gin.Context, prod bool, log *zap.Logger, req *http.Request, handler string) error {
 	...
 	if err != nil {
 		...
 		JSON(c, http.StatusInternalServerError, "[BricksLLM] cannot write field to buffer")
-		return
+		return err
 	}
 	...
-	req.Body = io.NopCloser(&b)
+	req.Body = io.NopCloser(&b)
+	return nil
 }

And in the caller:

-	modifyGPTTranscriptionsRequest(ginCtx, prod, log, req, handler)
+	if err := modifyGPTTranscriptionsRequest(ginCtx, prod, log, req, handler); err != nil {
+		return
+	}

Also applies to: 235-286

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/server/web/proxy/audio_extended.go` around lines 65 - 67, The call
to modifyGPTTranscriptionsRequest currently swallows multipart-rewrite failures
by writing responses internally and returning void, causing processGPTAudio to
continue and later call client.Do(req) which can double-write the response;
change modifyGPTTranscriptionsRequest to return an error (or a bool + error) and
in processGPTAudio (where modifyGPTTranscriptionsRequest(ginCtx, prod, log, req,
handler) is invoked) check that return value and immediately return from
processGPTAudio if an error/non-ok is returned (so you don't proceed to
client.Do(req)); update all other call sites in the same file (including the
block around lines 235-286) to handle the new return and propagate or log the
error appropriately.

Comment on lines +107 to +114
// videos
router.POST("/api/providers/openai/v1/videos", getVideoHandler(prod, client, e))
router.POST("/api/providers/openai/v1/videos/edits", getVideoHandler(prod, client, e))
router.POST("/api/providers/openai/v1/videos/extensions", getVideoHandler(prod, client, e))
router.GET("/api/providers/openai/v1/videos/:video_id", getVideoHandler(prod, client, e))
router.DELETE("/api/providers/openai/v1/videos/:video_id", getVideoHandler(prod, client, e))
router.POST("/api/providers/openai/v1/videos/:video_id/remix", getVideoHandler(prod, client, e))
router.GET("/api/providers/openai/v1/videos/:video_id/content", getVideoHandler(prod, client, e))
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Video routes currently bypass model allowlists.

These endpoints are reachable, but the middleware never sets model for /api/providers/openai/v1/videos requests, so the later isModelAllowed / isModelSupported checks run against "" and allow any video model through. That undercuts the allowlist part of this feature.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/server/web/proxy/proxy.go` around lines 107 - 114, The video routes
call getVideoHandler but never set the request's model value before the later
isModelAllowed / isModelSupported checks, letting empty model pass; update the
routing/middleware so the model is extracted and set on the request context for
all video endpoints (e.g., in the same middleware that other routes use) by
reading the model from the incoming payload/form or defaulting to the intended
video model, ensuring getVideoHandler sees a non-empty model and that
isModelAllowed / isModelSupported are enforced for routes such as the
POST/GET/DELETE handlers for "/api/providers/openai/v1/videos" and its subpaths.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants