Skip to content

I am poor#19

Open
SucksToBeAnik wants to merge 2 commits intotsensei:mainfrom
SucksToBeAnik:i-am-poor
Open

I am poor#19
SucksToBeAnik wants to merge 2 commits intotsensei:mainfrom
SucksToBeAnik:i-am-poor

Conversation

@SucksToBeAnik
Copy link
Copy Markdown

feat: zero-API-key local mode (Ollama LLM + Chatterbox TTS + Ollama image gen)

This PR adds a complete local-first pipeline that requires no API keys, enabling anyone to run OpenReels entirely on their own hardware using open-source models. Each provider is independent, so users can freely mix free local providers with paid cloud ones.

What's new

🦙 Ollama LLM provider (--provider ollama)

  • New OllamaLLM provider hitting Ollama's local /api/chat endpoint with structured JSON output
  • Since Ollama has no web search, the research stage is replaced by an interactive topic brief collected from the user before the pipeline starts (see UX section below)
  • Stub ResearchResult returned when enableWebSearch=true so the pipeline continues seamlessly

🖼️ Ollama image generation (--image-provider ollama)

  • New OllamaImage provider using Ollama's experimental /api/generate image endpoint
  • Currently macOS-only (per Ollama's announcement) — clear error shown on Linux/Windows with fallback suggestions
  • Supports x/flux2-klein:4b, x/flux2-klein:9b, x/z-image-turbo:latest
  • Fixed response field: Ollama returns data.image (singular), not data.images[]
  • Image generation runs sequentially for Ollama (vs parallel for cloud providers) since local inference is single-threaded — prevents all-but-one failures

🎙️ Chatterbox Turbo TTS (--tts-provider chatterbox)

  • New ChatterboxTTS provider bridging to a Python subprocess (scripts/chatterbox_tts.py)
  • Auto-installs chatterbox-tts into an isolated venv at ~/.openreels/chatterbox-venv on first use — no manual setup required
  • Detects uv and uses it when available (10–100× faster installs, handles uv-managed Pythons that block ensurepip)
  • Falls back to system python3.12/python3.11 with python -m venv
  • Pins setuptools<70 to fix pkg_resources import required by the perth watermarker dependency
  • Auto-detects device: MPS on Apple Silicon, CPU elsewhere
  • Uses async spawn (not spawnSync) to keep the Node event loop unblocked during model load

UX highlights

Interactive model selection

When running with --provider ollama or --image-provider ollama without specifying model flags, OpenReels presents a numbered selection menu. Pulled models are shown first with a ✓ marker using their exact pulled tag — Ollama returns 404 for bare names like gemma3. The image model list is locked to only the two image-capable models so LLM models never leak in.
image

Interactive topic brief (replaces web search)

Since Ollama has no web search, a guided context-gathering flow runs before the pipeline. Three modes:

  1. Guided — Ollama generates 3 topic-specific questions, user answers them; answers become key_facts for the creative director
  2. Freeform — user writes a few lines; stored as summary
  3. Skip — continues with topic name only
    Can also be supplied non-interactively via --brief "..." for Docker/CI use.
image

New CLI flags

Flag Description
--provider ollama Use local Ollama for LLM
--tts-provider chatterbox Use local Chatterbox Turbo for TTS
--image-provider ollama Use local Ollama for image generation
--ollama-model <tag> Skip interactive selection (e.g. llama3.1:8b)
--ollama-image-model <tag> Skip interactive selection (e.g. x/flux2-klein:4b)
--ollama-host <url> Ollama API host (default: http://localhost:11434)
--brief <text> Provide topic context non-interactively
--chatterbox-device <device> cpu, cuda, or mps
--chatterbox-audio-prompt <path> Reference WAV for voice cloning

Example commands

# Fully local — no API keys required (macOS)
pnpm start "your topic" \
  --provider ollama \
  --tts-provider chatterbox \
  --image-provider ollama
# Mix: best LLM quality + free TTS + free images
pnpm start "your topic" \
  --provider anthropic \
  --tts-provider chatterbox \
  --image-provider ollama
# Linux/Windows: free LLM + free TTS + Gemini images
pnpm start "your topic" \
  --provider ollama \
  --tts-provider chatterbox \
  --image-provider gemini

Known issue

text_card scenes may show a prompt description instead of display text. When using smaller local models, the creative director sometimes writes the visual_prompt for text_card scenes as a style description ("Bold white text: 'Falling into the cave.'") instead of just the display text ("Falling into the cave."). This is a model instruction-following limitation — smaller models don't reliably follow the constraint to write only the verbatim display text.
image


Testing

# Dry run (no assets, validates pipeline wiring)
pnpm start "your topic" \
  --provider ollama \
  --tts-provider chatterbox \
  --image-provider ollama \
  --dry-run
# Full run
pnpm start "Batman" \
  --provider ollama \
  --tts-provider chatterbox \
  --image-provider ollama

Prerequisites: Ollama installed and running (ollama serve), at least one LLM model pulled (ollama pull llama3.1:8b), Python 3.11 or 3.12 on PATH.

…nd CLI options

- Added support for Ollama as a local LLM and image provider, including interactive model selection.
- Introduced Chatterbox TTS for text-to-speech functionality, with setup instructions and requirements.
- Updated README to reflect new prerequisites and usage instructions for local development.
- Enhanced CLI options to include new parameters for Ollama and Chatterbox configurations.
- Improved cost estimation logic to account for free local providers.
- Added validation for local provider availability and setup processes.

This update significantly expands the capabilities of the pipeline for local development and usage.
…topic context collection

- Updated `collectTopicBrief` function to include LLM provider as a parameter, allowing for dynamic question generation based on the selected LLM.
- Improved user interaction flow for providing topic context, offering options for guided questions, freeform input, or skipping.
- Enhanced cost estimation logic to accommodate the new LLM provider parameter.
- Refined error handling and output messages in the Chatterbox TTS provider for better user experience.
- Updated relevant interfaces and types to ensure consistency across the pipeline.

These changes significantly improve the flexibility and usability of the pipeline for local development.
tsensei

This comment was marked as outdated.

tsensei

This comment was marked as outdated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants