I’ve been playing during the last week with Rhasspy 2.5.0-pre using Snowboy for Wake word detection on my Pi-zero satellite (running Buster) connected to a Jabra 410. For information, my Rhasspy master (base) runs in a Debian 9 stretch VM under Proxmox (where my Snips master is still running as well).
Even if everything is mostly ok so far, I’m still disappointed by Snowboy performance/accuracy compared to Snips hotword from snips-satellite. I’m using exactly the same HW so there is no reason that the weakness comes from here.
With Snips, I have low CPU usage (around 15%) and excellent detection even at 3-4 meters from the Jabra and not speaking so loud. When my appartment is very quiet, I’m even detected by several snips satellites (including one at 7-8 meters).
With Snowboy, I don’t get stable performance even if it’s not so simple to describe.
In a quiet room, it usually works well for the 2 or 3 first detections. If I say again the wake word some time later, it’s hardly detected, having to repeat it without guarantee to succeed.
In a more noisy room (living room with kids playing), I get quite some (and sometimes a lot of) false detections and if I lower the sensitivity then the I’m not even detected anymore when I say the wake word
By disabing Rhasspy webserver, the CPU load is quite OK: 15-20% minly due to snowboy process which is very well expected load (discussed in another post). But the overall results are still disapointing.
So far, I have been playing a lot with the following tweaking axis:
- wakeword - I tried “Jarvis”, “Hey snips”, “Ok google” and did not get very different performance among the 3
- loudness at recording - I tried recording the wakeword between 20cm and 1.5 meter from the Jabra but no major difference
- wakewod samples optimization - Even if SnowboyCustomMaker done by @KiboOst is doing a good job, I noticed that it sometimes cuts a bit too much on both start and end sides. Therefore I tried with arecord + audacity doing noise reduction and amplification, following the advices of @ced_cox on his blog
- sensitivity setting - between 0.35 and 0.5 but impossible to get a good trade-off between lack of detection and false detections
- audio_gain - between 1 and 3 but no clear impact of the setting on the same trade-off
- alsamixer capture setting - between 0 and +6dB (Jabra 410 has a 3dB step hence the value can be 0,3 or 6)
On snowboy website, I have read that USB microphones (Jabra 410 should be in this category) are a bit weak in recording volume. I would agree with this because I have the feeling that I get slightly better results when increasing alsamixer capture gain but this is still not at the level of Snips hotword out-of-the-box (default settings) performance.
I’m interested by your experience and recommandations on the way top optimize and whether you find your snowboy performance “really good” or simply “good but could be better”…
By the way, I’m also a bit surprised to see that Kitt.ai has decided to shutdown all their products (including Snowboy) at the end of the year. Being in May, I’m affraid that my optimization efforts are not worth if we all have to switch to another too for wakeword detection…
Sorry for the long message… your help is appreciated