Pi-zero+Jabra : need help to improve snowboy performance

KiboOst · May 12, 2020, 12:55pm

I have very good result with it and snowboy.

tuxedo78 · May 12, 2020, 1:52pm

Thanks @KiboOst for your quick answer.

Please could you share your snowboy settings (sensitivity and audio_gain), alsamixer capture gain and distance to the pi hat when recording the wake word? I may give it a try on one of my satellites in the next days…

3issa · May 12, 2020, 3:36pm

I use a respeaker 2-mics and once I have restarted Rhasspy it works like a charm. I have very good results. But that doesn’t last… After a while, Snowboy doesn’t answer anymore and I have to restart Rhasspy again.
I also use a respeaker 2-mics with a rpi A3+ and I have no problem. Everything is perfect.

Actually, I considered to try Rhasspy with snips-hotword but now I’m wondering to only use rpi 3 as satellite.

rolyan_trauts · May 12, 2020, 3:58pm

I think @tuxedo78 sorted it and the config webserver causes much load.

fastjack · May 12, 2020, 4:20pm

It also depends on the Snowboy model.

Apart from “snowboy”, all the other “universal” models (alexa, jarvis, etc.) use applyFrontend=true which drastically increase the CPU load.

To reduce the load on Pi0, maybe try with a custom model or use the universal “snowboy” model…

tuxedo78 · June 17, 2020, 12:37pm

I come back to this old topic in order to share with everyone my outcomes.

First, I have to say that I reached a very good performance level… even if it took me lot of time and trials/tracks

About the Jabra configuration, I came to the conclusion that I was getting best detection performance at maximum capture level in alsamixer (+6dB on Jabra 410). It’s interesting to notice that unlike Respeaker devices, the Jabra device does not “record” background noise in quiet environments (likely because of internal VAD thresholds). This is obvious when you open in Audacity the recorded wave file, it’s totally flat until you start speaking.

Second, I got the best detection peformance (coverage + accuracy) when recording close enough (around 20-30 cm) to the Jabra without saturating the microphone. It means that the wave curve (opened in Audacity) has to be as high as possible but never reaching the 1.0 mark on Audacity Y axis. I also trie using Amplification feature from Audacity (with lower volume records but it was not giving good results at all). For the records, my records at 1+ meter were giving poorer results.

The third thing is that the sensitivity is very… sensitive! I mean that I had to go to 2nd decimal in order to find the best trade-off between coverage (distance to the microphone while being detected) vs false detections. I even see differences between 0.425 and 0.43!

During the optimization process, I created 2 pmdl files with the same wav records as input. I found it useful to compare the coverage vs detection with different settings and quickly (at least faster) converge to the best values.

Last thing is that I increased coverage when boosting the audio_gain to 5.0 (6.0 was giving additional false detections).

At the moment I use “ok Jarvis” as wakeword with audio_gain=5.0 and sensitivity=0.43 with similar performance on both Pi-zero and PI3B+. Furthermore with my Quart “optimizations” (name it workaround if you prefer ), the CPU load is very decent (around 15%).

Hopefully this can be useful to others. No doubt that my results are specific to my own setup but some pieces of the approach could be reused.

Pep · June 19, 2020, 6:37am

Which wakeword are you talking about ? The one embedded in Rhasspy named jarvis.umdl ?

tuxedo78 · June 19, 2020, 7:18am

No it’s a personal wakeword

Pep · June 19, 2020, 7:30am

Ok, I understand.

I have also managed to achieve good results with pi zero and personnal wakeword, thanks to your workaround on the webserver.

I’m currently trying to set up 2 different models based on the same wakeword recorded by 2 different people and I’m dealing with some difficulties. It’s quite difficult to find good settings in order to trigger only one wakeword and not the second.

Did you manage to do that ? I think it only needs fine tuning but that requires a lot of test and my wife doesn’t have my patience…

HorizonKane · June 19, 2020, 1:59pm

I am currently running a satellite with snowboy standard model as the wake word and it is …not really satisfying. Sometime it reacts instantly, sometime it takes something like 30 seconds to react and if I tried it multiple times in between it will end up in multiple reactions presenting me a bing bing orchestre.

tuxedo78 · June 19, 2020, 2:01pm

Standard (universal) models are usually too big, depending on your HW… I recommend you to use a personal wake word. It should make a big difference

HorizonKane · June 19, 2020, 2:03pm

It is a Pi Zero, forgot to mention that.

Snips was running fine on it, but I wanted to create an own wakeword anyways, so I will give it a try.

fastjack · June 19, 2020, 2:19pm

I think the only Snowboy universal model usable on a Pi0 is « Snowboy ». All the others use applyFrontend which use much more resources.

Personal wake words will be much more CPU friendly.

HorizonKane · June 19, 2020, 5:38pm

Currently experimenting with a personal model, tricky. Must create a better record ^^

Question: Can’t I just set Hotword on the satellite to MQTT and let the master perform the detection?

koan · June 19, 2020, 5:50pm

You can, but then your satellites have to continuously stream audio data to the master so it can detect the hotword.

HorizonKane · June 19, 2020, 6:10pm

I see.

Will try around a bit more. Tried to use UDP to containt audio in the satellite but it is not working. The documentation differs from the interface, it only shows a port to enter. But in rhasspy I am asked for a host and port for snowboy and audio recording. What host should I enter there? 127.0.0.1 seems like a good bet to me as it should stay on that host ^^

koan · June 19, 2020, 6:12pm

You can leave the host field empty.

HorizonKane · June 21, 2020, 8:37am

Meh, I’m getting a little upset.

Used Jabra to record a new hotword in good quality, all is working fine. Out of a sudden no more wakeup. After it was working great half the day. Playing around with the settings sensitivity settings, no more good results. Going back to the old setting, no wakeup.

The experience is so extremely incosistent that I have the feeling I’m missing something out.

For example: When rhasspy restarted, is there a delay (few minutes) before hotword detection becomes available again? Otherwise I can’t explain the behaviour I’m observing.Sometimes I give up frustrated and just leave it and 30min later I get a false positive. I give my wakeword a try then and it is working good.

Right now the stupid thing is reacting when my wife drops something in the kitchen but never when I say the hotword, no matter how clear and loud I pronounce it. The same hotword that was working great hours before.

Will keep on trying

HorizonKane · June 21, 2020, 9:27am

Update: Just asked my wife to record the hotword and now it is working much better

HorizonKane · June 22, 2020, 9:35am

Made a great step forward and I have to admit that it was best to follow what tuxedo said:

I recorded the wakeword three times with audacity, taking care that the wave is strong but does not hit 1 on the X axis.
Upload those three samples to snowboy (wakeword is now ok polly)
And very important as I don’t use Jabra but PSeye and seeed mics on my master and satellites: Go to advanced options and add the audio_gain (between 3 and 5).

Seems to be working pretty good, still needs some fine tuning ofc.

Update:

Running almost perfect now with the above settings but:

audio_gain: 5 for the PSeye cam on the master
audio_gain: 1 for the 2mic seeed Hats on the satellites
sensitivity: 0.48 on all

So far no false positives and reacting almost every time I say “Okay Polly”.