Conversation
|
@coderabbitai full review |
✅ Actions performedFull review triggered. |
WalkthroughThis PR extends OpenAI provider support by adding cost estimation for audio transcription/translation and video processing, introducing corresponding type definitions, implementing proxy handlers to route new audio models to specialized processors, and registering new video proxy routes with cost estimation. Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant Handler as Audio Handler
participant Processor as GPT Audio<br/>Processor
participant OpenAI as OpenAI API
participant Estimator
Client->>Handler: POST /audio/transcriptions<br/>(model: gpt-4o-transcribe)
Handler->>Handler: Extract model from form
Handler->>Handler: Route check for gpt-4o-*
Handler->>Processor: processGPTTranscriptions(...)
Processor->>Processor: Validate request & context
Processor->>Processor: Build http.Request<br/>(multipart form data)
Processor->>Processor: Detect streaming mode
Processor->>Processor: Modify request<br/>(response_format handling)
Processor->>OpenAI: Execute request
OpenAI-->>Processor: Non-streaming: 200 OK<br/>TranscriptionResponse
Processor->>Processor: Unmarshal response
Processor->>Estimator: EstimateTranscriptionCost<br/>(secs, model, usage)
Estimator-->>Processor: costInUsd
Processor->>Processor: Store costInUsd in context
Processor-->>Client: JSON or text response
Note over Processor,OpenAI: Streaming path:
OpenAI-->>Processor: newline-delimited chunks
loop For each SSE chunk
Processor->>Processor: Unmarshal TranscriptionStreamChunk
Processor->>Processor: Extract delta/text
Processor->>Processor: Check if IsDone()
alt Chunk is done
Processor->>Estimator: EstimateTranscriptionCost<br/>(accumulated usage)
Estimator-->>Processor: final costInUsd
end
Processor-->>Client: SSE event
end
Processor-->>Client: SSE [DONE]
sequenceDiagram
participant Client
participant Handler as Video Handler
participant Validator as URL Builder
participant OpenAI as OpenAI API
participant Estimator
Client->>Handler: POST/GET/DELETE<br/>/api/providers/openai/v1/videos
Handler->>Handler: Validate request & context
Handler->>Validator: constructVideoURL(path)
Validator-->>Handler: https://api.openai.com/...
Handler->>Handler: Create http.Request<br/>(copy method, body, headers)
Handler->>OpenAI: Execute request
alt Success (200)
OpenAI-->>Handler: VideoResponseMetadata
Handler->>Handler: Unmarshal response
alt POST request (paid)
Handler->>Estimator: EstimateVideoCost<br/>(metadata)
Estimator-->>Handler: costInUsd
Handler->>Handler: Store costInUsd in context
end
Handler-->>Client: Status 200 + response body
else Error (non-200)
OpenAI-->>Handler: Error response
Handler->>Handler: Unmarshal ErrorResponse
Handler->>Handler: Log error details
Handler-->>Client: Original status + error body
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 5
🧹 Nitpick comments (1)
internal/server/web/proxy/audio.go (1)
172-176: Extract the GPT audio-model check into one helper.The same hard-coded model list now drives branching in both handlers. A shared predicate keeps transcription/translation routing from drifting when this allowlist changes again.
Also applies to: 342-346
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/server/web/proxy/audio.go` around lines 172 - 176, Extract the hard-coded allowlist into a single helper predicate (e.g., isGPTTranscriptionModel(model string) bool) that returns true for "gpt-4o-transcribe", "gpt-4o-transcribe-diarize", and "gpt-4o-mini-transcribe"; then replace the inline checks in the handler around the model variable and the other duplicated branch (the block that currently calls processGPTTranscriptions(c, prod, client, e, model) and the similar block at the later location) to call this helper instead so both routing points share the same source of truth.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@internal/provider/openai/cost.go`:
- Around line 820-838: EstimateVideoCost currently mandates a model-size lookup
and errors when metadata.Size is absent, but the cost map may contain fallback
keys like "sora-2" (model-only). Change EstimateVideoCost to try lookups in
order: 1) if size is present/normalized, try "model-size"; 2) if that fails (or
size missing/normalization returns an empty/expected-error), try the model-only
key "model"; and only return an error if neither key exists in
ce.tokenCostMap["video"]. Handle normalization errors by treating missing size
as absent (do not immediately return), and update the same lookup logic in the
analogous image pricing function (the one around lines 841-852) so both video
and image cost resolution use the model-then-model-only fallback.
In `@internal/provider/openai/types.go`:
- Around line 101-106: The current GetSecondsAsFloat silently returns 0 on parse
failure which causes EstimateVideoCost to under-bill; change GetSecondsAsFloat
to return (float64, error) (or add a new GetSecondsAsFloatSafe that returns
(float64, error)) and propagate/handle the error in callers like
EstimateVideoCost and any other call sites, validating v.Seconds before using it
and returning/propagating the parse error instead of treating malformed or
missing seconds as 0 so billing is correct.
In `@internal/server/web/proxy/audio_extended.go`:
- Around line 65-67: The call to modifyGPTTranscriptionsRequest currently
swallows multipart-rewrite failures by writing responses internally and
returning void, causing processGPTAudio to continue and later call
client.Do(req) which can double-write the response; change
modifyGPTTranscriptionsRequest to return an error (or a bool + error) and in
processGPTAudio (where modifyGPTTranscriptionsRequest(ginCtx, prod, log, req,
handler) is invoked) check that return value and immediately return from
processGPTAudio if an error/non-ok is returned (so you don't proceed to
client.Do(req)); update all other call sites in the same file (including the
block around lines 235-286) to handle the new return and propagate or log the
error appropriately.
- Around line 57-67: The streaming branch reads form values (ginCtx.PostForm)
which drains multipart request bodies, so when isStreaming is true the audio
payload can be lost because modifyGPTTranscriptionsRequest (which rebuilds the
multipart body) is skipped; ensure the multipart body is reconstructed after any
form parsing regardless of isStreaming by calling or inlining the same
body-rebuild logic used in modifyGPTTranscriptionsRequest (or factoring that
logic into a helper) before proxying in the streaming path so req.Body contains
the full multipart payload for upstream.
In `@internal/server/web/proxy/proxy.go`:
- Around line 107-114: The video routes call getVideoHandler but never set the
request's model value before the later isModelAllowed / isModelSupported checks,
letting empty model pass; update the routing/middleware so the model is
extracted and set on the request context for all video endpoints (e.g., in the
same middleware that other routes use) by reading the model from the incoming
payload/form or defaulting to the intended video model, ensuring getVideoHandler
sees a non-empty model and that isModelAllowed / isModelSupported are enforced
for routes such as the POST/GET/DELETE handlers for
"/api/providers/openai/v1/videos" and its subpaths.
---
Nitpick comments:
In `@internal/server/web/proxy/audio.go`:
- Around line 172-176: Extract the hard-coded allowlist into a single helper
predicate (e.g., isGPTTranscriptionModel(model string) bool) that returns true
for "gpt-4o-transcribe", "gpt-4o-transcribe-diarize", and
"gpt-4o-mini-transcribe"; then replace the inline checks in the handler around
the model variable and the other duplicated branch (the block that currently
calls processGPTTranscriptions(c, prod, client, e, model) and the similar block
at the later location) to call this helper instead so both routing points share
the same source of truth.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 7dce883e-09e8-40eb-ac83-14fcdb5c7c91
📒 Files selected for processing (7)
internal/provider/openai/cost.gointernal/provider/openai/types.gointernal/server/web/proxy/audio.gointernal/server/web/proxy/audio_extended.gointernal/server/web/proxy/middleware.gointernal/server/web/proxy/proxy.gointernal/server/web/proxy/video.go
| func (ce *CostEstimator) EstimateVideoCost(metadata *VideoResponseMetadata) (float64, error) { | ||
| if metadata == nil { | ||
| return 0, errors.New("metadata is nil") | ||
| } | ||
| costMap, ok := ce.tokenCostMap["video"] | ||
| if !ok { | ||
| return 0, errors.New("video cost map is not provided") | ||
| } | ||
| model := metadata.Model | ||
| size, err := normalizedVideoSize(metadata.Size) | ||
| if err != nil { | ||
| return 0, err | ||
| } | ||
| costKey := fmt.Sprintf("%s-%s", model, size) | ||
| cost, ok := costMap[costKey] | ||
| if !ok { | ||
| return 0, errors.New("model with provided size is not present in the video cost map") | ||
| } | ||
| return cost * metadata.GetSecondsAsFloat(), nil |
There was a problem hiding this comment.
Support size-less video pricing before forcing model-size lookup.
The new cost map contains plain keys like sora-2 and sora-2-pro, but this implementation always requires metadata.Size and always looks up model-size. Any response without a size will now error and record $0, even though you already have a fallback price configured.
💡 Suggested fix
func (ce *CostEstimator) EstimateVideoCost(metadata *VideoResponseMetadata) (float64, error) {
if metadata == nil {
return 0, errors.New("metadata is nil")
}
costMap, ok := ce.tokenCostMap["video"]
if !ok {
return 0, errors.New("video cost map is not provided")
}
model := metadata.Model
- size, err := normalizedVideoSize(metadata.Size)
- if err != nil {
- return 0, err
- }
- costKey := fmt.Sprintf("%s-%s", model, size)
- cost, ok := costMap[costKey]
+ costKey := model
+ if metadata.Size != "" {
+ size, err := normalizedVideoSize(metadata.Size)
+ if err != nil {
+ return 0, err
+ }
+ costKey = fmt.Sprintf("%s-%s", model, size)
+ }
+ cost, ok := costMap[costKey]
if !ok {
return 0, errors.New("model with provided size is not present in the video cost map")
}
return cost * metadata.GetSecondsAsFloat(), nil
}Also applies to: 841-852
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@internal/provider/openai/cost.go` around lines 820 - 838, EstimateVideoCost
currently mandates a model-size lookup and errors when metadata.Size is absent,
but the cost map may contain fallback keys like "sora-2" (model-only). Change
EstimateVideoCost to try lookups in order: 1) if size is present/normalized, try
"model-size"; 2) if that fails (or size missing/normalization returns an
empty/expected-error), try the model-only key "model"; and only return an error
if neither key exists in ce.tokenCostMap["video"]. Handle normalization errors
by treating missing size as absent (do not immediately return), and update the
same lookup logic in the analogous image pricing function (the one around lines
841-852) so both video and image cost resolution use the model-then-model-only
fallback.
| func (v *VideoResponseMetadata) GetSecondsAsFloat() float64 { | ||
| if secondsFloat, err := strconv.ParseFloat(v.Seconds, 64); err == nil { | ||
| return secondsFloat | ||
| } | ||
| return 0 | ||
| } |
There was a problem hiding this comment.
Don't silently coerce invalid video duration to $0.
If seconds is missing or malformed, this returns 0 and EstimateVideoCost under-bills without any error. Please return an error here, or make EstimateVideoCost validate the raw field before multiplying.
💡 Suggested direction
-func (v *VideoResponseMetadata) GetSecondsAsFloat() float64 {
- if secondsFloat, err := strconv.ParseFloat(v.Seconds, 64); err == nil {
- return secondsFloat
- }
- return 0
+func (v *VideoResponseMetadata) GetSecondsAsFloat() (float64, error) {
+ return strconv.ParseFloat(v.Seconds, 64)
}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@internal/provider/openai/types.go` around lines 101 - 106, The current
GetSecondsAsFloat silently returns 0 on parse failure which causes
EstimateVideoCost to under-bill; change GetSecondsAsFloat to return (float64,
error) (or add a new GetSecondsAsFloatSafe that returns (float64, error)) and
propagate/handle the error in callers like EstimateVideoCost and any other call
sites, validating v.Seconds before using it and returning/propagating the parse
error instead of treating malformed or missing seconds as 0 so billing is
correct.
| isStreaming := ginCtx.PostForm("stream") == "True" || ginCtx.PostForm("stream") == "true" | ||
|
|
||
| if isStreaming { | ||
| req.Header.Set("Accept", "*/*") | ||
| req.Header.Set("Cache-Control", "no-cache") | ||
| req.Header.Set("Connection", "keep-alive") | ||
| } | ||
|
|
||
| if !isStreaming { | ||
| modifyGPTTranscriptionsRequest(ginCtx, prod, log, req, handler) | ||
| } |
There was a problem hiding this comment.
Streaming GPT audio requests can be proxied with an empty body.
By the time this branch runs, the request form has already been parsed (PostForm("model") in the caller, and PostForm("stream") here). For multipart uploads that consumes ginCtx.Request.Body. Because the streaming path skips modifyGPTTranscriptionsRequest, upstream can receive EOF instead of the audio payload.
💡 Suggested direction
- isStreaming := ginCtx.PostForm("stream") == "True" || ginCtx.PostForm("stream") == "true"
+ isStreaming := ginCtx.PostForm("stream") == "True" || ginCtx.PostForm("stream") == "true"
- if !isStreaming {
- modifyGPTTranscriptionsRequest(ginCtx, prod, log, req, handler)
- }
+ if err := modifyGPTTranscriptionsRequest(ginCtx, prod, log, req, handler, isStreaming); err != nil {
+ return
+ }Rebuild the multipart body after any form parsing, regardless of streaming mode.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@internal/server/web/proxy/audio_extended.go` around lines 57 - 67, The
streaming branch reads form values (ginCtx.PostForm) which drains multipart
request bodies, so when isStreaming is true the audio payload can be lost
because modifyGPTTranscriptionsRequest (which rebuilds the multipart body) is
skipped; ensure the multipart body is reconstructed after any form parsing
regardless of isStreaming by calling or inlining the same body-rebuild logic
used in modifyGPTTranscriptionsRequest (or factoring that logic into a helper)
before proxying in the streaming path so req.Body contains the full multipart
payload for upstream.
| if !isStreaming { | ||
| modifyGPTTranscriptionsRequest(ginCtx, prod, log, req, handler) | ||
| } |
There was a problem hiding this comment.
Propagate multipart-rewrite failures back to the caller.
modifyGPTTranscriptionsRequest writes error responses internally, but it returns void, so processGPTAudio still continues to client.Do(req). That can double-write the response and mask the actual failure.
💡 Suggested fix
-func modifyGPTTranscriptionsRequest(c *gin.Context, prod bool, log *zap.Logger, req *http.Request, handler string) {
+func modifyGPTTranscriptionsRequest(c *gin.Context, prod bool, log *zap.Logger, req *http.Request, handler string) error {
...
if err != nil {
...
JSON(c, http.StatusInternalServerError, "[BricksLLM] cannot write field to buffer")
- return
+ return err
}
...
- req.Body = io.NopCloser(&b)
+ req.Body = io.NopCloser(&b)
+ return nil
}And in the caller:
- modifyGPTTranscriptionsRequest(ginCtx, prod, log, req, handler)
+ if err := modifyGPTTranscriptionsRequest(ginCtx, prod, log, req, handler); err != nil {
+ return
+ }Also applies to: 235-286
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@internal/server/web/proxy/audio_extended.go` around lines 65 - 67, The call
to modifyGPTTranscriptionsRequest currently swallows multipart-rewrite failures
by writing responses internally and returning void, causing processGPTAudio to
continue and later call client.Do(req) which can double-write the response;
change modifyGPTTranscriptionsRequest to return an error (or a bool + error) and
in processGPTAudio (where modifyGPTTranscriptionsRequest(ginCtx, prod, log, req,
handler) is invoked) check that return value and immediately return from
processGPTAudio if an error/non-ok is returned (so you don't proceed to
client.Do(req)); update all other call sites in the same file (including the
block around lines 235-286) to handle the new return and propagate or log the
error appropriately.
| // videos | ||
| router.POST("/api/providers/openai/v1/videos", getVideoHandler(prod, client, e)) | ||
| router.POST("/api/providers/openai/v1/videos/edits", getVideoHandler(prod, client, e)) | ||
| router.POST("/api/providers/openai/v1/videos/extensions", getVideoHandler(prod, client, e)) | ||
| router.GET("/api/providers/openai/v1/videos/:video_id", getVideoHandler(prod, client, e)) | ||
| router.DELETE("/api/providers/openai/v1/videos/:video_id", getVideoHandler(prod, client, e)) | ||
| router.POST("/api/providers/openai/v1/videos/:video_id/remix", getVideoHandler(prod, client, e)) | ||
| router.GET("/api/providers/openai/v1/videos/:video_id/content", getVideoHandler(prod, client, e)) |
There was a problem hiding this comment.
Video routes currently bypass model allowlists.
These endpoints are reachable, but the middleware never sets model for /api/providers/openai/v1/videos requests, so the later isModelAllowed / isModelSupported checks run against "" and allow any video model through. That undercuts the allowlist part of this feature.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@internal/server/web/proxy/proxy.go` around lines 107 - 114, The video routes
call getVideoHandler but never set the request's model value before the later
isModelAllowed / isModelSupported checks, letting empty model pass; update the
routing/middleware so the model is extracted and set on the request context for
all video endpoints (e.g., in the same middleware that other routes use) by
reading the model from the incoming payload/form or defaulting to the intended
video model, ensuring getVideoHandler sees a non-empty model and that
isModelAllowed / isModelSupported are enforced for routes such as the
POST/GET/DELETE handlers for "/api/providers/openai/v1/videos" and its subpaths.
https://bugtracker.codiodev.com/issue/codio-17481/Add-BricksLLM-support-for-OpenAI-Text-to-Speech-and-Speech-to-Text-model-allowlist-audio-routing
Summary by CodeRabbit