Pi-zero+Jabra : need help to improve snowboy performance

Hi all,

I’ve been playing during the last week with Rhasspy 2.5.0-pre using Snowboy for Wake word detection on my Pi-zero satellite (running Buster) connected to a Jabra 410. For information, my Rhasspy master (base) runs in a Debian 9 stretch VM under Proxmox (where my Snips master is still running as well).

Even if everything is mostly ok so far, I’m still disappointed by Snowboy performance/accuracy compared to Snips hotword from snips-satellite. I’m using exactly the same HW so there is no reason that the weakness comes from here.

With Snips, I have low CPU usage (around 15%) and excellent detection even at 3-4 meters from the Jabra and not speaking so loud. When my appartment is very quiet, I’m even detected by several snips satellites (including one at 7-8 meters).

With Snowboy, I don’t get stable performance even if it’s not so simple to describe.
In a quiet room, it usually works well for the 2 or 3 first detections. If I say again the wake word some time later, it’s hardly detected, having to repeat it without guarantee to succeed.
In a more noisy room (living room with kids playing), I get quite some (and sometimes a lot of) false detections and if I lower the sensitivity then the I’m not even detected anymore when I say the wake word :frowning:

By disabing Rhasspy webserver, the CPU load is quite OK: 15-20% minly due to snowboy process which is very well expected load (discussed in another post). But the overall results are still disapointing.

So far, I have been playing a lot with the following tweaking axis:

  • wakeword - I tried “Jarvis”, “Hey snips”, “Ok google” and did not get very different performance among the 3
  • loudness at recording - I tried recording the wakeword between 20cm and 1.5 meter from the Jabra but no major difference
  • wakewod samples optimization - Even if SnowboyCustomMaker done by @KiboOst is doing a good job, I noticed that it sometimes cuts a bit too much on both start and end sides. Therefore I tried with arecord + audacity doing noise reduction and amplification, following the advices of @ced_cox on his blog
  • sensitivity setting - between 0.35 and 0.5 but impossible to get a good trade-off between lack of detection and false detections
  • audio_gain - between 1 and 3 but no clear impact of the setting on the same trade-off
  • alsamixer capture setting - between 0 and +6dB (Jabra 410 has a 3dB step hence the value can be 0,3 or 6)

On snowboy website, I have read that USB microphones (Jabra 410 should be in this category) are a bit weak in recording volume. I would agree with this because I have the feeling that I get slightly better results when increasing alsamixer capture gain but this is still not at the level of Snips hotword out-of-the-box (default settings) performance.

I’m interested by your experience and recommandations on the way top optimize and whether you find your snowboy performance “really good” or simply “good but could be better”…

By the way, I’m also a bit surprised to see that Kitt.ai has decided to shutdown all their products (including Snowboy) at the end of the year. Being in May, I’m affraid that my optimization efforts are not worth if we all have to switch to another too for wakeword detection…

Sorry for the long message… your help is appreciated :slight_smile:

fx

Hi,

if when you do a arecord test and you ear clearly your voice,the problem is not the mic or the sound but pooketSphinx.
I never managed to get it to work properly.

did you try snowboy ? you have a Snowboy-CustomMaker here : https://kiboost.github.io/jeedom_docs/other/Rhasspy/SnowboyCustomMaker/

Ced

If you have Snips running, it is also possible to trigger the Snips wakeword for rhasspy.
Just set wakeword to Hermes

I am also interested to know if some people manage to get Snowboy working on a pi zero reliably over the long term.

I don’t…

@3issa which kind of microphone did you connect to your Pi-zero? USB, any seeed product, …? Did you consider going back to snips_hotword?

@KiboOst please could you comment on your experience with the 2-mics pi hat + pi-zero?

I also have a 2-mics pi hat (from my 1st snips experience) that I could use again if someone shares his global settings (alsa, snowboy, etc) and his extremly happy with the solution :wink:

I have very good result with it and snowboy.

Thanks @KiboOst for your quick answer.

Please could you share your snowboy settings (sensitivity and audio_gain), alsamixer capture gain and distance to the pi hat when recording the wake word? I may give it a try on one of my satellites in the next days…

I use a respeaker 2-mics and once I have restarted Rhasspy it works like a charm. I have very good results. But that doesn’t last… After a while, Snowboy doesn’t answer anymore and I have to restart Rhasspy again.
I also use a respeaker 2-mics with a rpi A3+ and I have no problem. Everything is perfect.

Actually, I considered to try Rhasspy with snips-hotword but now I’m wondering to only use rpi 3 as satellite.

I think @tuxedo78 sorted it and the config webserver causes much load.

It also depends on the Snowboy model.

Apart from “snowboy”, all the other “universal” models (alexa, jarvis, etc.) use applyFrontend=true which drastically increase the CPU load.

To reduce the load on Pi0, maybe try with a custom model or use the universal “snowboy” model…

I come back to this old topic in order to share with everyone my outcomes.

First, I have to say that I reached a very good performance level… even if it took me lot of time and trials/tracks :sweat_smile:

About the Jabra configuration, I came to the conclusion that I was getting best detection performance at maximum capture level in alsamixer (+6dB on Jabra 410). It’s interesting to notice that unlike Respeaker devices, the Jabra device does not “record” background noise in quiet environments (likely because of internal VAD thresholds). This is obvious when you open in Audacity the recorded wave file, it’s totally flat until you start speaking.

Second, I got the best detection peformance (coverage + accuracy) when recording close enough (around 20-30 cm) to the Jabra without saturating the microphone. It means that the wave curve (opened in Audacity) has to be as high as possible but never reaching the 1.0 mark on Audacity Y axis. I also trie using Amplification feature from Audacity (with lower volume records but it was not giving good results at all). For the records, my records at 1+ meter were giving poorer results.

The third thing is that the sensitivity is very… sensitive! I mean that I had to go to 2nd decimal in order to find the best trade-off between coverage (distance to the microphone while being detected) vs false detections. I even see differences between 0.425 and 0.43!

During the optimization process, I created 2 pmdl files with the same wav records as input. I found it useful to compare the coverage vs detection with different settings and quickly (at least faster) converge to the best values.

Last thing is that I increased coverage when boosting the audio_gain to 5.0 (6.0 was giving additional false detections).

At the moment I use “ok Jarvis” as wakeword with audio_gain=5.0 and sensitivity=0.43 with similar performance on both Pi-zero and PI3B+. Furthermore with my Quart “optimizations” (name it workaround if you prefer :slight_smile:), the CPU load is very decent (around 15%).

Hopefully this can be useful to others. No doubt that my results are specific to my own setup but some pieces of the approach could be reused.

1 Like

Which wakeword are you talking about ? The one embedded in Rhasspy named jarvis.umdl ?

No it’s a personal wakeword

Ok, I understand.

I have also managed to achieve good results with pi zero and personnal wakeword, thanks to your workaround on the webserver.

I’m currently trying to set up 2 different models based on the same wakeword recorded by 2 different people and I’m dealing with some difficulties. It’s quite difficult to find good settings in order to trigger only one wakeword and not the second.

Did you manage to do that ? I think it only needs fine tuning but that requires a lot of test and my wife doesn’t have my patience… :grin:

I am currently running a satellite with snowboy standard model as the wake word and it is …not really satisfying. Sometime it reacts instantly, sometime it takes something like 30 seconds to react and if I tried it multiple times in between it will end up in multiple reactions presenting me a bing bing orchestre.

Standard (universal) models are usually too big, depending on your HW… I recommend you to use a personal wake word. It should make a big difference :slight_smile:

It is a Pi Zero, forgot to mention that.

Snips was running fine on it, but I wanted to create an own wakeword anyways, so I will give it a try.

I think the only Snowboy universal model usable on a Pi0 is « Snowboy ». All the others use applyFrontend which use much more resources.

Personal wake words will be much more CPU friendly.

Currently experimenting with a personal model, tricky. Must create a better record ^^

Question: Can’t I just set Hotword on the satellite to MQTT and let the master perform the detection?

You can, but then your satellites have to continuously stream audio data to the master so it can detect the hotword.