I’m running around in circles, thinking I have found a solution only to find the products are EOL or not produced anymore, like the Matrix Voice or ReSpeaker. My brain is shutting down, the more I read about everything. I hope you can give me a little push in the right direction.
So I’m creating my own home voice assistant, but I want to go really custom, developing my own (locally hosted) server that handles the speech processing. So what I need is a wake word-only system that simply relays all the audio in real-time to my server (stream or chunks), including the part that came before the wake word. The server streams back a real-time audio response which should be played back to my own speaker.
I think I can turn to a Rhasspy, but I’m not entirely sure about the part where I completely bypass Rhasspy’s core functionality and only use the wake word mechanism (Porcupine seems great). I did look a bit through the docs, but couldn’t directly find an answer. Here’s what ChatGPT had to say about it:
To have Rhasspy relay the audio stream in real-time to your server, you’d need to customize its pipeline, as Rhasspy primarily handles wake word detection and local voice processing. Here’s a potential method to set it up:
Steps to Relay Audio from Rhasspy to Your Server:
- Custom Audio Forwarding Script:
- You can modify Rhasspy’s pipeline by creating a custom Python or Node.js script that intercepts the audio as soon as the wake word is detected and forwards the audio to your server via a WebSocket or REST API.
- Rhasspy has support for custom command scripts when a wake word is detected. You can hook into this to send the recorded audio to your server in real time.
- Enable Streaming via MQTT or WebSocket:
- Rhasspy communicates using MQTT and WebSocket protocols for various events. You could set up Rhasspy to stream or send chunks of audio through one of these protocols.
- Have your server subscribe to these events or streams, processing the audio in real time as it’s sent from Rhasspy.
- Use External Audio Handlers:
- Instead of using Rhasspy’s internal speech-to-text processing, you could configure Rhasspy to act as an audio collector that pushes audio data to an external service (your server). You would need to modify its configuration to disable internal speech recognition and instead trigger a streaming action to your server when it detects the wake word.
- Rhasspy Remote HTTP Integration:
- Rhasspy supports remote HTTP endpoints for speech recognition. After detecting the wake word, Rhasspy could send the recorded audio via HTTP POST to your server. Your server would then handle the processing of the audio in real time.
Is this accurate?