feat: on-device TTS (Supertonic), on-device STT (Parakeet), direct API key#68
Open
joceqo wants to merge 4 commits intofarzaa:mainfrom
Open
feat: on-device TTS (Supertonic), on-device STT (Parakeet), direct API key#68joceqo wants to merge 4 commits intofarzaa:mainfrom
joceqo wants to merge 4 commits intofarzaa:mainfrom
Conversation
…ctable options Supertonic (66M ONNX, ~167× realtime on Apple Silicon) replaces the ElevenLabs cloud TTS call entirely on-device with no API key or internet after first use. Parakeet (NVIDIA via FluidAudio/CoreML) replaces AssemblyAI streaming with fully local ASR on the Neural Engine, also no API key after initial model download. Both are selectable at runtime via new "Voice" and "Speech" segmented pickers in the menu bar panel, persisted to UserDefaults. All existing providers still work. Requires two Xcode package dependencies (see CLAUDE.md): - microsoft/onnxruntime-swift-package-manager (Supertonic) - FluidInference/FluidAudio (Parakeet) https://claude.ai/code/session_01KAKiAyGESHfP4cNGeVJmi8
…launch Allow users to enter their own Anthropic API key in the panel UI, bypassing the Cloudflare Worker proxy entirely. With Parakeet (on-device STT) + Supertonic (on-device TTS) + direct API key, the only network call is to api.anthropic.com — zero Worker dependency. Also fixes Parakeet not being restored as the STT provider on app restart (BuddyDictationManager was always defaulting to AssemblyAI from Info.plist instead of reading the UserDefaults selection). Adds bundle path to startup log for TCC debugging. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Reverts signing team to Farza's original ID so the PR doesn't break the build for other contributors. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
sk-ant-...key in the panel UI, Claude calls go straight toapi.anthropic.combypassing the Worker proxyWith Parakeet + Supertonic + direct API key, the only network call is to
api.anthropic.com— zero Worker dependency.Both TTS and STT providers are selectable at runtime via new segmented pickers in the panel, persisted to UserDefaults. All existing providers (ElevenLabs, AssemblyAI) still work unchanged.
New dependencies (Xcode → File → Add Package Dependencies)
https://github.com/microsoft/onnxruntime-swift-package-manager.githttps://github.com/FluidInference/FluidAudio.gitRelated
See #28 for LM Studio / Gemma 4 local model integration — complementary to this PR (on-device TTS/STT vs on-device LLM).
Test plan
🤖 Generated with Claude Code