Node.js/TypeScript voice assistant built on OpenAI Realtime API: speech‑in, chat, and TTS all happen inside a single realtime session. Wake‑word detection is fully local via Picovoice Porcupine.
- Local wake word: by default
jarvis(or any supported builtin keyword, or your own.ppnfile). - Gate window: after the wake word, mic audio is streamed to Realtime API only inside a fixed‑length window.
- Local VAD (RMS): inside the window we drop silence/noise and can close the window early after
GATE_SILENCE_MSms of silence. - Automatic end‑of‑utterance: OpenAI side uses
semantic_vad + createResponse, so the model decides when you finished speaking. - Barge‑in: saying the wake word while the assistant is speaking interrupts the current answer and starts a new turn.
- Self‑protection from self‑hearing: while the assistant is speaking, mic audio is not streamed to Realtime API.
- Node.js 18+
- OpenAI API key — for Realtime API (
gpt-realtime). - Picovoice Porcupine AccessKey — for local wake‑word detection.
- Audio output (one of):
- Linux: PipeWire —
pw-play(normally present if you use PipeWire). - macOS: SoX —
brew install sox(commandplay).
- Linux: PipeWire —
cd voice-assistant
yarn install
cp .env.example .env
# Then edit .env and set:
# - OPENAI_API_KEY=sk-...
# - PORCUPINE_ACCESS_KEY=...Build and run:
yarn build
yarn startOr quick dev run (rebuild + run):
yarn dev- On startup we create a
RealtimeAgentandRealtimeSessionwith modelgpt-realtimeand config:audio.input.format = "pcm16"audio.input.transcription.model = "gpt-4o-mini-transcribe"audio.input.turnDetection = semantic_vad + createResponseaudio.output.format = "pcm16"(24 kHz)
- The mic reads 16 kHz mono PCM16 via
node-record-lpcm16. - Audio is fed into Porcupine locally. When the wake word is detected:
- a gate window is opened for
WAKE_WINDOW_MSms; - we remember the time of the last detected speech.
- a gate window is opened for
- While the window is open:
- all chunks from the mic are streamed via
session.sendAudio(...); - a simple RMS‑based VAD (
MIN_RMS) tracks speech presence and can close the window early afterGATE_SILENCE_MSms of silence.
- all chunks from the mic are streamed via
- The Realtime model itself detects end‑of‑utterance (
semantic_vad) and starts speaking. session.on("audio")yields PCM chunks that are piped to an external player: on Linux —pw-play, on macOS —sox play(24 kHz, mono, s16le).
- Start the assistant (
yarn startoryarn dev). - In the console you should see:
Connected. Say wake-word to activate…Wake-word active: jarvis(or your keyword).
- Clearly say the wake word (for example "jarvis"), wait for
Wake word detectedin logs (and/or a beep). - Immediately after the wake word say your command, e.g.:
- “what’s the weather today”
- “what can you do”
- “tell me a joke”
- The assistant will transcribe your utterance and respond with synthesized speech.
- To interrupt the current answer and start a new one, say the wake word again.
See .env.example for the full list. Short summary:
| Variable | Description |
|---|---|
OPENAI_API_KEY |
OpenAI API key (required). |
PORCUPINE_ACCESS_KEY |
Picovoice AccessKey (required for wake word). |
PORCUPINE_BUILTIN_KEYWORD |
Built‑in wake word (default jarvis). If not supported on your platform, the code will pick another supported keyword. |
PORCUPINE_KEYWORD_PATH |
Path to a custom .ppn file (takes priority over builtin). |
WAKE_WINDOW_MS |
How many ms after wake word to stream mic audio into Realtime API (default 8000). |
GATE_SILENCE_MS |
Close the window if there is no speech for this many ms (default 1200). |
MIN_RMS |
RMS threshold for local VAD (default 200). Lower = more sensitive to quiet speech. |
WAKE_DEBOUNCE_MS |
Debounce interval for wake word (default 1500 ms). |
AUDIO_DEVICE |
Input device (mic). Linux: ALSA device for arecord -D. macOS: device via AUDIODEV. |
AUDIO_OUTPUT_DEVICE |
Output device (speakers). Linux: pw-play target (node id or name). Get it via wpctl status or pw-cli list-objects Node (see Sinks). Default is system output (often Bluetooth). On macOS leave unset. |
-
Wake word is not always detected:
- slightly decrease
MIN_RMS(e.g. to150); - speak the wake word a bit louder and closer to the mic;
- optionally reduce
WAKE_DEBOUNCE_MSif you intentionally trigger wake word frequently.
- slightly decrease
-
Sometimes there is no answer:
- make sure that after
Wake word detectedyou actually speak within the window (WAKE_WINDOW_MS); - verify that
OPENAI_API_KEYis valid and has access togpt-realtime.
- make sure that after
-
No sound / playback errors:
- Linux: ensure PipeWire is installed and
pw-playis inPATH; - macOS: install SoX —
brew install sox(needs theplaycommand).
- Linux: ensure PipeWire is installed and
-
On Linux sound goes to a Bluetooth speaker but you want wired USB speakers (e.g. Edifier):
- List output devices: run
wpctl status(orpw-cli list-objects Node). In Audio → Sinks find your USB output (Edifier / USB Audio) and note its id (number) or node name. - In
.envsetAUDIO_OUTPUT_DEVICEto that id or name, e.g.AUDIO_OUTPUT_DEVICE=42orAUDIO_OUTPUT_DEVICE=alsa_output.usb-0bda_4014-00.analog-stereo. - Restart the assistant — audio will go to the selected sink.
- List output devices: run
-
On Raspberry Pi mic suddenly stops working after reboot:
- ALSA card indices can change between boots. Run
arecord -land check which card/device is your mic (e.g.card 3: USB PnP Audio Device, device 0). - Update
.envaccordingly, e.g.AUDIO_DEVICE=plughw:3,0(card 3, device 0). - Restart the assistant (systemd user service or
yarn start).
- ALSA card indices can change between boots. Run
-
On Raspberry Pi with USB speakers the beep / beginning of the answer is sometimes cut off:
- This is usually due to the USB audio sink and PipeWire suspending/rewiring the stream. You can reduce this by creating
~/.config/pipewire/pipewire.conf.d/10-no-suspend.confand disabling suspend:- Set
session.suspend-timeout-seconds = 0to prevent auto‑suspend. - Optionally add a
context.rulesblock for your sinknode.name(fromwpctl status/pw-cli) and set a reasonablenode.latency(e.g.256/24000).
- Set
- Reboot the Pi so PipeWire picks up the new config.
- This is usually due to the USB audio sink and PipeWire suspending/rewiring the stream. You can reduce this by creating
-
Process exits with
session_expiredafter ~60 minutes:- Realtime sessions are limited to 60 minutes by the API. This project listens for
session_expiredand exits cleanly so a supervisor can restart it. - On Raspberry Pi, configure your
voice-assistant.service(user unit) withRestart=alwaysandRestartSec=5, runsystemctl --user daemon-reloadand thensystemctl --user restart voice-assistant. - On macOS or when running manually, you can wrap
node dist/index.jsin a simplewhile true; do ...; sleep 5; doneshell loop to auto‑restart.
- Realtime sessions are limited to 60 minutes by the API. This project listens for
src/index.ts— main assistant: Porcupine, wake word, gate window, local VAD,RealtimeSession, audio output.src/audio-test.ts— minimal Porcupine/wake‑word test without OpenAI.src/types/external-modules.d.ts— type declarations for modules without types (node-record-lpcm16,@picovoice/porcupine-node).
MIT