Skip to content

feat: on-device TTS (Supertonic), on-device STT (Parakeet), direct API key#68

Open
joceqo wants to merge 4 commits intofarzaa:mainfrom
joceqo:feature/on-device-tts-stt-direct-api
Open

feat: on-device TTS (Supertonic), on-device STT (Parakeet), direct API key#68
joceqo wants to merge 4 commits intofarzaa:mainfrom
joceqo:feature/on-device-tts-stt-direct-api

Conversation

@joceqo
Copy link
Copy Markdown

@joceqo joceqo commented Apr 14, 2026

Summary

  • Supertonic on-device TTS (66M ONNX, ~167× realtime on Apple Silicon) — zero-latency voice responses with no API key or internet after first model download (~200MB)
  • Parakeet on-device STT (NVIDIA via FluidAudio/CoreML/Neural Engine) — fully local speech recognition, no API key after initial download (~600MB)
  • Direct Anthropic API key input — enter your own sk-ant-... key in the panel UI, Claude calls go straight to api.anthropic.com bypassing the Worker proxy
  • Parakeet restore on launch — fixes BuddyDictationManager always defaulting to AssemblyAI from Info.plist instead of reading the user's saved selection

With Parakeet + Supertonic + direct API key, the only network call is to api.anthropic.com — zero Worker dependency.

Both TTS and STT providers are selectable at runtime via new segmented pickers in the panel, persisted to UserDefaults. All existing providers (ElevenLabs, AssemblyAI) still work unchanged.

New dependencies (Xcode → File → Add Package Dependencies)

Package URL Purpose
onnxruntime-swift-package-manager https://github.com/microsoft/onnxruntime-swift-package-manager.git ONNX Runtime for Supertonic
FluidAudio https://github.com/FluidInference/FluidAudio.git Parakeet CoreML models

Related

See #28 for LM Studio / Gemma 4 local model integration — complementary to this PR (on-device TTS/STT vs on-device LLM).

Test plan

  • Select Supertonic as voice provider, press hotkey, verify on-device TTS plays audio
  • Select Parakeet as speech provider, press hotkey, verify on-device transcription works
  • Enter Anthropic API key in panel, verify "Direct" badge and Claude calls work without Worker
  • Clear API key, verify fallback to Worker proxy
  • Quit and relaunch with Parakeet selected — verify it's still selected (not reset to AssemblyAI)
  • Test button for both TTS and STT providers

🤖 Generated with Claude Code

claude and others added 4 commits April 14, 2026 06:32
…ctable options

Supertonic (66M ONNX, ~167× realtime on Apple Silicon) replaces the ElevenLabs
cloud TTS call entirely on-device with no API key or internet after first use.
Parakeet (NVIDIA via FluidAudio/CoreML) replaces AssemblyAI streaming with fully
local ASR on the Neural Engine, also no API key after initial model download.

Both are selectable at runtime via new "Voice" and "Speech" segmented pickers in
the menu bar panel, persisted to UserDefaults. All existing providers still work.

Requires two Xcode package dependencies (see CLAUDE.md):
  - microsoft/onnxruntime-swift-package-manager (Supertonic)
  - FluidInference/FluidAudio (Parakeet)

https://claude.ai/code/session_01KAKiAyGESHfP4cNGeVJmi8
…launch

Allow users to enter their own Anthropic API key in the panel UI,
bypassing the Cloudflare Worker proxy entirely. With Parakeet (on-device
STT) + Supertonic (on-device TTS) + direct API key, the only network
call is to api.anthropic.com — zero Worker dependency.

Also fixes Parakeet not being restored as the STT provider on app
restart (BuddyDictationManager was always defaulting to AssemblyAI
from Info.plist instead of reading the UserDefaults selection).

Adds bundle path to startup log for TCC debugging.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Reverts signing team to Farza's original ID so the PR doesn't break
the build for other contributors.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants