Satellite/client setup confusion - gstreamer with wakeword doesn't work

ulno · March 1, 2020, 1:32am

Just found this wonderful project trying to switch away from mycroft.ai (I was trying to setup something based on porcupine, Julius, and node-red - but why re-invent wheel if there is already a good community?).

My setup here has an old thinkpad x230 with intel i5 3rd gen processor and 4GB ram as home gateway setup. I want to run most of rhasspy on there (currently still testing it in a venv on my desktop where it works great with porcupine as wakeword engine).

I have several raspberries Pi 3 with google’s ai-kit mic, another x230 (as media center with mic attached), an old intel NUC with a good usb mic in various rooms. I work a lot with raspberry pi zeros, esp8266, and esp32 so these might be options for satellites too.

I wonder how I could minimize the software that I install on all these satellite computers. How can I just setup a wakeword engine on them and then stream to rhasspy on my homeserver - I want to keep though somehow the source of the voice command to send answers back.

I tried using gstreamer locally, but enabling it seems to not work for the wakeword engine - I might also not understand how to separate this. Am I missing something here or does it work easier?

Is such a setup currently possible (if how?) or do I have to install Rhasspy on each satellite (and clone their configuration) for now and just connect to different node-red sockets on my home-server?

chunkking_mann · March 2, 2020, 1:26pm

Yay. A fellow AIY kit user. As I understand it, the wake word recognition is done locally on the satellite. And then your command is chunked and sent to the server for processing, You can have multiple satellites with unique I.D.s so your server will know which one to respond back to.

Rhasspy Read the Docs - gstreamer FYI -if you hadn’t come across that already.
And I did see much discussion about gstreamer over on the Home Assistant forum from last July.
My system works (Pi3B w/ AIY Voice V1 as a satellite, and i3NUC as server - both running the Docker images) and I didn’t know anything about gstreamer until today.

Good luck in your adventure. I’ve been enjoying mine.

ulno · March 4, 2020, 12:51am

Yep, I ended up setting up several instances of full rhasspy installations and just cloning my grammar.
I haven’t set it up on the AIY kit yet - so far the recognition gets worse and worse the bigger I make my grammar. I can only use fuzzywuzzy, openfst seems to just detect random things (in best cases it get’s the command without the first word not detecting the intent). Fuzzywuzzy guesses about 1 in 2 times correctly, but that’s of course not too good either.

I am using porcupine as wakeword engine - I wonder if I should test with pocketsphinx - it seems that porcupine prevents the first word of my command being detected.

@chunkking_mann You are getting good results on AIY - even having something like a washing machine running in the background?

chunkking_mann · March 4, 2020, 12:48pm

@ulno Alas, I am getting good response as in “it wakes up when I say the wakeword” - using snowboy, BTW.
I haven’t had time to use it to “do” anything - lack of time and steep learning curve for all the other bits involved. For one, I have to speak fairly loudly. I’m not sure if that’s sound levels of the sample recordings. I’ve adjusted alsamixer levels and submitted several different samples, but don’t have the response I want, without nearly shouting at it, yet. So it’s still in the quiet room. But it does ignore the TV when it’s on.

ulno · March 4, 2020, 6:11pm

I switched to kaldi as speech to text engine and have much better results - i even managed once or twice to use it on a server remotely (there is an option in the settings) - but then it broke out of reasons I can not reconstruct (does interrupt the connection - even if it is ethernet).

Detection rate of my sentences when using the recording function on the webpage is now nearly 100%. When using porcupine wakeword engine, it often eats my first words (when they are fast like tell, set, or switch - turn and make do work as start words).

I will try snowboy now and see if that also butchers my recordings.

ulno · March 9, 2020, 9:45pm

Reducing the throwaway buffer in the vad helped. It seems like kaldi + openfst or mycroft adopt is the way to go. Streaming does not work either with gstreamer nor with the network option (the latter crashes both the client and the server). So, as my celeron in the NUC is too old (had huge trouble also running mycroft on it), I will give up on it as a sattelite and try to employ a raspberry pi there.