Skip to content

feat(video): enhance video generation quality with dedicated prompter and provider upgrades#43

Merged
tsensei merged 5 commits intomainfrom
feat/video-generation-quality
Apr 10, 2026
Merged

feat(video): enhance video generation quality with dedicated prompter and provider upgrades#43
tsensei merged 5 commits intomainfrom
feat/video-generation-quality

Conversation

@tsensei
Copy link
Copy Markdown
Owner

@tsensei tsensei commented Apr 10, 2026

Summary

Video generation quality enhancement for OpenReels v0.17.0. Two commits:

Video Quality Enhancement:

  • New prompts/video-prompter.md with structured cinematography language (shot type, subject, action, camera movement, lighting, style)
  • Negative prompts on both providers: defaults + archetype antiArtifactGuidance
  • Kling upgraded from v2.1 standard to v2.6 Pro (same $0.07/s pricing, better physics)
  • cfg_scale: 0.5 passed to Kling API
  • Fixed Veo config: removed personGeneration, enhancePrompt, generateAudio, negativePrompt params that veo-3.1-lite-generate-preview rejects (was causing Forbidden errors)
  • Fixed factory videoModel routing: GeminiVideo receives override in both primary/secondary positions, FalVideo always uses its own default
  • motionPrompt and negativePrompt logged in VideoResolution metadata for observability

Documentation:

  • CHANGELOG v0.17.0 entry
  • README: updated video provider versions, Visuals stage description
  • CLAUDE.md: updated test count (349 → 395), video provider versions
  • TODOS.md: added Motion enum expansion (P2) and video safety re-prompting (P2)
  • VERSION: 0.16.0 → 0.17.0

Test Coverage

Tests: 44 files, 395 passing (was 349 on main)

New tests cover:

  • Negative prompt construction (defaults + antiArtifactGuidance, empty fallback)
  • Provider param threading (Gemini negativePrompt, Fal negative_prompt + cfg_scale)
  • Veo config: unsupported params not passed to lite model
  • Kling v2.6 Pro model ID
  • VideoResolution metadata includes motionPrompt/negativePrompt
  • Motion prompt fallback when LLM fails
  • Factory videoModel routing (GeminiVideo gets override, FalVideo gets default)

Pre-Landing Review

No issues found. No frontend files changed. No prompt eval files matched.

TODOS

2 new items added:

  • Expand Motion enum with cinematic camera directions (P2)
  • Video generation safety re-prompting (P2)

Test plan

  • All vitest tests pass (395 tests, 0 failures)
  • Manual Veo API verification: bare config works, unsupported params correctly removed
  • Version bump and changelog updated

🤖 Generated with Claude Code

Talha Jubair Siam added 3 commits April 10, 2026 10:46
…, negative prompts, and provider upgrades

- Add dedicated prompts/video-prompter.md with structured cinematography language
  (shot type, camera movement, temporal progression, physics/realism guidance)
- Add negativePrompt support to VideoProvider interface, threaded through
  video-resolver with defaults + archetype antiArtifactGuidance
- Upgrade Kling from v2.1 standard to v2.6 Pro (same pricing, better physics)
- Add cfg_scale parameter to Fal/Kling provider
- Fix Veo config: remove unsupported params (personGeneration, enhancePrompt,
  generateAudio, negativePrompt) that veo-3.1-lite-generate-preview rejects
- Fix factory videoModel routing: pass to GeminiVideo only, not FalVideo
- Add motionPrompt and negativePrompt to VideoResolution for observability
- Load video-prompter.md instead of image-prompter.md when mode="video"
@tsensei
Copy link
Copy Markdown
Owner Author

tsensei commented Apr 10, 2026

@claude review

@claude
Copy link
Copy Markdown

claude Bot commented Apr 10, 2026

Claude finished @tsensei's task in 1m 46s —— View job


Code Review: feat(video) — video generation quality enhancement

  • Gather context: read changed files
  • Review video provider changes (Gemini/Fal/video-resolver)
  • Review factory routing fix
  • Review prompt system (video-prompter.md + image-prompter.ts)
  • Review tests
  • Review schema and cost-estimator changes
  • Post findings

Overall this is a clean, well-motivated PR. The Veo Forbidden-error fix alone makes it worth shipping, the negative-prompt plumbing is solid, and the test coverage is genuinely comprehensive. A few things worth a look before merge:


Issues

1. negativePrompt silently dropped for Gemini — no observability signal

gemini.ts:28 accepts negativePrompt in the interface but never uses it. The comment explains why (veo-3.1-lite doesn't support it), but the caller in video-resolver.ts:104 passes it expecting an effect. Right now the negative prompt is logged in VideoResolution.metadata but there's no indication to the user/operator that it was silently discarded by Gemini.

A one-liner console.warn inside generate() (similar to the warn already used elsewhere in the codebase) would surface this clearly:

// In gemini.ts generate(), after the config block:
if (opts.negativePrompt) {
  console.warn(`[video] negativePrompt ignored: veo-3.1-lite-generate-preview does not support it`);
}

Fix this →


2. imageGenTimeMs is always 0 in VideoResolution

video-resolver.ts:62:

const imageGenTimeMs = 0; // Already tracked by caller

Every VideoResolution object emitted by resolveAIVideo will have imageGenTimeMs: 0. The comment is accurate ("already tracked by caller"), but the field being present with a hardcoded 0 is misleading in observability logs and the VideoResolution type. Either:

  • Remove the field from VideoResolution (and the method: "image_fallback" path already omits it correctly — wait, no, it sets imageGenTimeMs there too)
  • Or accept it as a parameter so the real value can be passed through

Since this field is part of the interface defined at line 15-23, I'd at minimum remove imageGenTimeMs from VideoResolution or mark it as deprecated. If the caller already tracks it separately, it's noise here.

Fix this →


3. cfg_scale: 0.5 hardcoded — not in VideoProvider interface

fal.ts:44:

cfg_scale: 0.5,

This is always sent regardless of archetype. 0.5 for Kling means "loosely follow the prompt / prioritize motion diversity". That might be the right default, but it's a meaningful creative parameter (higher = more prompt-faithful, lower = more model-prior). Since antiArtifactGuidance is archetype-specific, a future extension might want per-archetype cfg_scale guidance. Right now this can't be tuned without editing the provider directly.

Not blocking, but worth a // TODO(P2): make cfg_scale configurable per-archetype comment if you want to keep the door open.


4. mimeType: "image/png" hardcoded in both providers

gemini.ts:39 and fal.ts:34:

mimeType: "image/png"   // gemini.ts
type: "image/png"       // fal.ts Blob

The VideoProvider.generate() interface (providers.ts:72) accepts sourceImage: Buffer with no MIME type hint. Today all callers pass PNG (image-gen always produces PNG), so this is fine. But it's a hidden assumption — if the buffer were ever JPEG the API would receive mismatched content.

A minimal guard would be to document this assumption in the interface or the method signature. Not blocking.


5. pickDuration fallback when supportedDurations is empty

video-resolver.ts:40:

return sorted[sorted.length - 1] ?? 5;

If a provider exposes supportedDurations: [], sorted[-1] is undefined and we fall back to 5 seconds. The provider may then reject 5 as unsupported. This is a provider-level contract violation so the ?? 5 fallback is reasonable, but a console.warn here would help debug future provider misconfigurations.


Minor observations

  • image_fallback path missing motionPrompt/negativePrompt in metadata (video-resolver.ts:151-163): when all video providers fail, the fallback VideoResolution omits these fields even though they were computed. They could be useful for diagnosing why generation failed. The fields are optional in the type (motionPrompt?: string) so this isn't a type error, but the debug value is lost.

  • Test placeholder value (video-resolver.test.ts:65): antiArtifactGuidance: "no artifacts" is a very generic mock. This works but a realistic archetype value (e.g. "avoid overexposed highlights, color banding") would make the test more representative. Cosmetic.


What's well done

  • The video-prompter.md is excellent — the 7-element anatomy and 10 motion rules are exactly what video models respond well to, and the length guidance (300-500 chars) is grounded.
  • Negative prompt construction is clean: archetypeGuidance ? \${DEFAULT}, ${archetypeGuidance}` : DEFAULT` is concise and handles the empty-string case correctly (tested).
  • The factory videoModel routing fix is correct and the two new tests ("primary" and "fal primary" cases) cover both branches well.
  • Removing personGeneration, enhancePrompt, generateAudio, negativePrompt from the Veo config is the right call. The comment documenting why is appreciated.
  • Test for "LLM optimization fails, use raw visual_prompt" (video-resolver.test.ts:205) is exactly the failure mode that matters most.
  • imageGenTimeMs: 0 aside, motionPrompt and negativePrompt in VideoResolution is good observability discipline.

Verdict: Items 1 and 2 are worth addressing before merge (low effort, concrete improvement). Items 3-5 are acceptable as-is with a comment or deferred to TODOS.

- Add console.warn when negativePrompt is silently dropped by Gemini's
  veo-3.1-lite model (review finding #1, valid)
@tsensei
Copy link
Copy Markdown
Owner Author

tsensei commented Apr 10, 2026

Code Review Response (Round 2)

Finding 1: negativePrompt silently dropped for Gemini — Fixed

Added console.warn in gemini.ts when negativePrompt is provided but ignored by the lite model. Commit: e162d98.

Finding 2: imageGenTimeMs always 0 — By design, not a bug

The 0 in video-resolver.ts:62 is a deliberate placeholder. The caller in orchestrator.ts:284-286 patches it with the real value after resolveAIVideo returns:

// orchestrator.ts:284-286
if (videoResult.videoResolution) {
  videoResult.videoResolution.imageGenTimeMs = imageGenTimeMs;
}

The resolver doesn't know when image gen started (it receives a pre-generated image buffer), so it sets 0 and lets the caller fill in the actual timing. The final VideoResolution in logs always has the correct value.

Finding 3: cfg_scale hardcoded — Design decision

0.5 is Kling's documented default. It means "balanced between prompt adherence and model creativity." Making this configurable per-archetype is possible but premature — we'd need empirical data on which archetypes benefit from different values. If we add it later, it's a one-line change per archetype JSON + a new field on the interface. Not blocking.

Finding 4: mimeType hardcoded as PNG — Design decision

All image providers (Gemini Imagen, DALL-E) produce PNG. The sourceImage: Buffer in the VideoProvider interface is always PNG because it comes from generateAIImage() which always outputs PNG. Changing this would require the entire image pipeline to support format negotiation. The assumption is correct today and will remain correct unless we add a JPEG image provider (unlikely, since PNG is lossless and better for video source frames).

Finding 5: pickDuration empty array fallback — Design decision

Both GeminiVideo and FalVideo hardcode non-empty supportedDurations arrays ([4, 6, 8] and [5, 10]). An empty array would be a provider implementation bug, not something the resolver should guard against. The ?? 5 is a last-resort safety net, not a real code path. Adding a warn here would be dead code in practice.

Minor observations — Acknowledged

  • image_fallback path missing motionPrompt/negativePrompt: fair point. These were computed before the provider loop, so they'd be useful for debugging why generation failed. Low priority but could add in a follow-up.
  • Test mock values: cosmetic, works fine for the assertions being made.

@tsensei
Copy link
Copy Markdown
Owner Author

tsensei commented Apr 10, 2026

@claude review

@claude
Copy link
Copy Markdown

claude Bot commented Apr 10, 2026

Claude finished @tsensei's task in 2m 35s —— View job


Code Review: feat(video) — Round 3

  • Gather context from PR and prior review exchange
  • Read changed files: gemini.ts, fal.ts, video-resolver.ts, factory.ts
  • Read prompt file (video-prompter.md) and image-prompter.ts
  • Read test files + orchestrator context
  • Post review findings

Round 3 covers the current branch state after the round-2 fix (negativePrompt warn on gemini.ts). The author's responses to findings 2-5 are well-reasoned and all confirmed by the code.


New finding: motion-prompt LLM usage is silently discarded

video-resolver.ts:67-76:

const optimized = await optimizeImagePrompt(...);
motionPrompt = optimized.prompt;
// optimized.usage ← never used

The optimizeImagePrompt call has real LLM cost (tokens for the motion prompt LLM call), but optimized.usage is dropped. Every other resolver in the codebase that calls optimizeImagePrompt accumulates the usage:

  • adaptive-resolver.ts:261: llmUsages.push(optimized.usage)
  • orchestrator.ts:159: usage = optimized.usage (for image mode)
  • music-resolver.ts:102: returns prompterUsage: promptResult.usage

The video resolver is the only one that doesn't. As a result, video pipeline runs silently under-report LLM cost — every AI video scene incurs an LLM call that's never reflected in the cost estimate.

The fix follows the music-resolver pattern: add a prompterUsage?: LLMUsage field to the resolveAIVideo return type and surface it to the orchestrator for accumulation.

Fix this →


Confirmed from round 2 (no action needed)

imageGenTimeMs = 0: Confirmed correct — orchestrator.ts:284-286 patches videoResult.videoResolution.imageGenTimeMs = imageGenTimeMs after resolveAIVideo returns. One nitpick: the if (videoResult.videoResolution) guard at line 285 checks a non-optional field (the return type always includes videoResolution), so the condition is always true. Harmless but mildly misleading — could be a direct assignment.

cfg_scale, mimeType, pickDuration fallback: All confirmed as acceptable design decisions. No changes needed.


Minor observation (new)

fal.ts:67 vs gemini.ts:81: Gemini checks fs.existsSync(tmpPath) || fs.statSync(tmpPath).size === 0; Fal only checks size. Since fsp.writeFile at line 65 would throw on path failure, this can't cause a silent issue in practice. But the asymmetry is worth aligning for consistency.


What's well done

  • The image-prompter.ts mode-split (VIDEO_PROMPT_PATH / IMAGE_PROMPT_PATH) is cleaner than the old approach of loading the image prompter and appending a video note — the dedicated system prompt can now evolve independently.
  • video-prompter.md is excellent production-quality prompt engineering. The 7-element anatomy with earlier elements weighted higher, one-action-one-camera-move rule, and the 300-500 char guidance are all well-grounded.
  • negativePrompt warn in gemini.ts (round-2 fix) is exactly right.
  • Factory routing (videoModel passed to GeminiVideo, undefined to FalVideo) is correct and both branches are covered by tests.
  • The image_fallback path consistency (provider fails → image is returned) is well-tested.

Verdict: One substantive finding (LLM usage gap) worth addressing before merge — it's a one-field addition that aligns with the existing pattern in every other resolver. Everything else is clean. Good PR overall.

The video resolver's optimizeImagePrompt call has real token cost but
the returned usage was silently discarded. Every other resolver in the
codebase (adaptive-resolver, music-resolver, orchestrator) accumulates
LLM usage for cost reporting. This aligns video-resolver with that
pattern: prompterUsage flows through VisualAssetResult to llmUsages.
@tsensei
Copy link
Copy Markdown
Owner Author

tsensei commented Apr 10, 2026

Code Review Response (Round 3)

Finding: motion prompter LLM usage silently discarded — Fixed

Good catch. optimizeImagePrompt in video-resolver was the only resolver in the codebase that didn't surface its LLM usage for cost accumulation. Every ai_video scene was incurring a motion-prompt LLM call that never showed up in the cost report.

What changed (commit 5f9aec6):

  1. video-resolver.ts: captures optimized.usage into prompterUsage variable, returns it on all 3 exit paths (success, fallback, unreachable)
  2. orchestrator.ts: added prompterUsage?: LLMUsage | null to VisualAssetResult, accumulates it in the scene results loop alongside r.usage
  3. video-resolver.test.ts: 2 new tests — verifies usage is surfaced on success, and null when the LLM call fails

Pattern matches music-resolver.ts (returns prompterUsage: promptResult.usage) and adaptive-resolver.ts (pushes optimized.usage to llmUsages).

Minor observation: fal.ts existsSync vs gemini.ts — Acknowledged

The asymmetry exists because fsp.writeFile (fal) throws on path failure while client.files.download (gemini) may silently produce an empty file. Both paths catch their respective failure modes. Could align for consistency in a follow-up, not blocking.

Confirmed: orchestrator guard at line 285 — Acknowledged

if (videoResult.videoResolution) is always true since the return type mandates it. Harmless guard, could be a direct assignment. Not worth a diff.

397 tests passing.

@tsensei
Copy link
Copy Markdown
Owner Author

tsensei commented Apr 10, 2026

@claude review

@claude
Copy link
Copy Markdown

claude Bot commented Apr 10, 2026

Claude finished @tsensei's task in 1m 41s —— View job


Code Review: feat(video) — Round 4

  • Gather context: read changed files
  • Verify round-3 fix (motion prompter LLM usage surfacing)
  • Check orchestrator integration
  • Review test coverage for new fix
  • Final pass on remaining files
  • Post findings

Round 4 covers the current branch state after commit 5f9aec6. All findings from rounds 1–3 have been addressed. This is a clean pass.


Round-3 fix verified: prompterUsage surfacing is correct

video-resolver.ts:79 — Usage is correctly captured:

motionPrompt = optimized.prompt;
prompterUsage = optimized.usage;   // ← captured

All 3 exit paths return prompterUsage:

  • Success path (line 141): prompterUsage included ✅
  • Last-provider-failed fallback (line 168): prompterUsage included ✅
  • Unreachable empty-providers fallback (line 188): prompterUsage included ✅

orchestrator.ts:615-616 — Accumulation is correct:

if (r.usage) llmUsages.push(r.usage);
if (r.prompterUsage) llmUsages.push(r.prompterUsage);

The if (r.prompterUsage) guard correctly handles both null (LLM failed) and undefined (non-video scene types). No double-counting: image scenes route their prompter cost through r.usage (as before); video scenes carry it separately via r.prompterUsage.

Tests (video-resolver.test.ts:205-229):

  • "surfaces motion prompter LLM usage for cost tracking" — verifies { inputTokens: 100, outputTokens: 50 }
  • "returns null prompterUsage when motion prompt LLM fails" — verifies null on exception ✅

All prior design decisions confirmed

Finding Status
negativePrompt warn in gemini.ts:33-35 Correctly emits warn with model name in message ✅
imageGenTimeMs: 0 filled by orchestrator Confirmed — orchestrator.ts:284-286 patches real value ✅
cfg_scale: 0.5 hardcoded in fal.ts:44 Acceptable default; door left open for P2 ✅
mimeType: "image/png" hardcoded Correct assumption for current image pipeline ✅
pickDuration ?? 5 fallback Both providers have hardcoded non-empty arrays; safety net is fine ✅
Factory videoModel routing GeminiVideo(config.videoModel) / FalVideo(undefined) — correct ✅

One minor observation (no action needed)

image-prompter.ts:36-44: If video-prompter.md fails to load, the catch silently falls back to the shorter inline prompt (line 38), then still appends the STYLE BIBLE (line 48). This is correct behavior and intentional. Just noting it so reviewers aren't surprised if they ever see a motion prompt that looks simpler than expected — it means the file was missing.


Verdict

This PR is ready to merge. All substantive issues across four rounds of review have been addressed. The round-3 LLM usage fix is complete and consistent with the pattern established by music-resolver.ts and adaptive-resolver.ts. Test coverage is comprehensive. The Veo Forbidden-error fix, negative prompt plumbing, Kling upgrade, and factory routing fix are all clean.

@tsensei tsensei merged commit d55601b into main Apr 10, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant