Snowboy custom wakeword false positive

KiboOst · January 2, 2020, 12:05pm

Hi,

Does anyone have experience on using custom wakeword with snowboy ?

I have a lot of false positive with words that aren’t near the wakeword. I can avoid that by decreasing sensitivity (ever put audio_gain to 2) but then the detection isn’t easy. Compared to snips custom wakeword, it’s really worse.

I have created the custom wakeword on the snowboy website, with a Surface Pro microphone.
Also, was in the same room where the rpi is. Seems it is better to record it in another room so the background noise isn’t the same and avoid false/positive, will try to record it again.

Anyway, do you have advices and/or feedback on this ? Didn’t find a way to record it on the pi with the respeaker, may be better than surface mic.

romkabouter · January 2, 2020, 12:27pm

I am using snowboy, but no custom wakeword. I am planning to do so.
Snowboy wakeword performs well on my systems.

synesthesiam · January 3, 2020, 8:17pm

I usually just use arecord to record something on the Pi. Use arecord -L to a a list of available devices. A command like this will record it like Rhasspy does:

arecord -D '<DEVICE>' -r 16000 -f S16_LE -c 1 -t wav > /path/to/my.wav

Hit CTRL+C when you’re finished speaking.

KiboOst · January 3, 2020, 8:21pm

how do use multiple wav files to generate the pmdl file then ?

KiboOst · January 3, 2020, 8:30pm

Ah, they are saying in the faq that we should always record wakeword with same microphone that will have to detect it !!

http://docs.kitt.ai/snowboy/#my-trained-model-works-well-on-laptops-but-not-on-pi-s

Will generate a new one with the pi and respeaker !

synesthesiam · January 3, 2020, 10:40pm

You’d just need to run the command multiple times with /path/to/my-1.wav, /path/to/my-2.wav, etc.

KiboOst · January 3, 2020, 10:41pm

Yes we need three samples, did it, uploading files on snowboy website to generate the pmdl.

Thanks

KiboOst · January 4, 2020, 10:05am

Hi,

when setting only my custom wakeword for snowboy in my profile json, if I remove snowboy.umdl file I have such message in rhasspy interface:

snowboy

Why should we have it if not used ?

KiboOst · January 4, 2020, 1:13pm

Well, it is still unusable

sensitivity 0.38 still detected lot of false positives. Even asr recognize “le sapin en haut” with nothing talking, just playing some piano.
sensitivity 0.35 is better for false positive but really hard to detect hotword …

Will try to redo custom wakeword and try some different settings, but never had so much problems with snips custom wakewords

synesthesiam · January 4, 2020, 9:29pm

I don’t know what tech was underneath Snip’s wake word system. A truly open source wake word system is something missing from the ecosystem currently.

koan · January 4, 2020, 9:34pm

Snips is using tract as part of their wake word engine:

fastjack · January 4, 2020, 10:33pm

@koan Do you know how to use something like Tract to train a wake word detection system?

It seems that to get a good recognition rate for a universal hotword, you need lots of records in the dataset augmented with lots of additional noise variations, etc. Exactly like for an acoustic model but for a very specific word.

For custom wake word it can be challenging using just a few records.

If snowboy and porcupine (which are not really open source) do not cut it, how can we do it?

adrianofoschi · January 5, 2020, 8:43am

+1 I have the same problem with custom word or “jarvis”

KiboOst · January 5, 2020, 11:23am

I have switch microphone from pyaudio to arecord in settings and it seems a lot better !
I can up sensitivity without more false positive

Also, something I got from discussing custom wakeword with snips guys: We should not record custom wakeword samples in same room as where the device will be. Each room have its own background noise, and having this same noise into wakeword can generate more false positives.
So, we should record custom wakeword samples on same device/mic that will detect wakeword, but in a different room, ideally the most silent one where you will never use master or a satellite. So you can reuse custom wakewords on every rhasspy devices with less problems.

koan · January 5, 2020, 7:11pm

No, unfortunately I’m not that deep (anymore) in language technology, and my interest has always been more in symbolic systems than in neural networks. But that won’t help us here…

synesthesiam · January 5, 2020, 10:03pm

Funny, I’m more of a symbolic guy too. We need to find someone who really knows wake word systems…

Someone who wants to benefit humanity, but make $0

thomas_cologne · January 5, 2020, 10:26pm

Does it make sence to contact:

Just an idea …

synesthesiam · January 6, 2020, 3:30am

Contact Snips? Or that user specifically?

thomas_cologne · January 6, 2020, 8:55am

Well, I don´t think that snips is willing to help, but maybe Sonos?
No, just kidding!

But this user was a maker for Snips and I just know that he also was very upset about the surprised sale to Sonos.
Maybe he is interested to support?!
I don´t know.

ulno · March 12, 2020, 3:24am

What about Mycroft precise - it’s a bit messy to set up, but we should be able to do our own training. Maybe we can even agree on some wakewords as a community and generate enough “noise” to train them?