Wake word detection rate

j3mu5 · February 3, 2020, 10:16am

Hello, everybody,

since i am doing speech recognition with kaldi i am absolutely thrilled how accurate the recognition is compared to pocketsphinx. Well over 95%.

Now there is only one problem with my Rhasspy: the sensitivity of the wake word.

porcupine is only an option with the freely available words (https://github.com/Picovoice/porcupine/tree/master/resources/keyword_files/raspberrypi), training every 30 days is not an option for me.
snowboy does not work satisfactorily with a small self-created training data set. Here you have to download widely trained models from https://snowboy.kitt.ai/dashboard.

For both wake word systems i get either too many false positives (when music / video is playing an action is often triggered afterwards) or the sensitivity is so low that the wake word is only rarely recognized. Normally I set the sensitivity high and every time a false positive is triggered, I reduce it a bit.

So now the questions:

Is there a better approach to the wake word that I haven’t seen before? (e.g. a possibility to import a lot of training data into a wake word system yourself?)
if there is no other wake word solution at the moment: @synesthesiam is it possible to set a minimum confidence for kaldi similar to pocketsphinx? This could at least reduce the unwanted actions (like “okay, I’ll switch off all devices!”) that follow false positives with background noise.
@synesthesiam is there also the possibility to switch the wake word detection of rhasspy off and on from home assistant? I tried to set the microphone volume to 0% using the console and alsamixer (which could be done using home assistant) - but rhasspy ignores this setting (no matter if pyaudio or arecord is selected in the rhasspy settings). This would be useful if you want to make rhasspy temporarily deaf during a movie.

ps: as loudspeaker / microphone i use a jabra 710, which is located quite centrally in the room.

FredTheFrog · February 3, 2020, 10:51am

I started with a Jabra 410, but added a Seeed Studios 4-mic array. It definitely seemed to improve the recognition, although I still get false positives from the television audio.

j3mu5 · February 3, 2020, 1:01pm

That’s exactly what I see: Many false positives or a lower detection rate with lower sensitivity, so you have to repeat the wake word several times.

I think the speaker is OK. As soon as the hot word is recognized the recognition of the spoken word with kaldi works fantastic. Other rooms, background noise and still my spoken word is understood in most cases.

frkos · February 3, 2020, 2:28pm

@j3mu5 did you try snowboy with its audio_gain parameter?
You can use moderate sensitivity but with increased mic volume… It works really well for me. Try to set it to 1.5 for example

https://rhasspy.readthedocs.io/en/latest/wake-word/#snowboy

OC2019OC · February 3, 2020, 2:33pm

I think your main problem with Snowboy is that you’re using the same wake word phrase for multiple people. If you want a custom model for each person, you will have to use different wake word phrases for each individual not just different models of the same phrase. Does this make sense?

j3mu5 · February 4, 2020, 1:52pm

@OC2019OC Yeah, that makes perfect sense. I will probably stick to a wake word that I train on myself as best as I can using only my voice data. For the other family members it will then hopefully fit.

j3mu5 · February 4, 2020, 2:09pm

@frkos:
Great, thanks for the tip! I hadn’t noticed that before, although I often look at the documentation.
I have now found at the following values, where I am curious about the false positive detection rate for background noises and conversations. The recognition rate is at least reasonably sufficient, since the second hot word is recognized at the latest.:

        "audio_gain": "1.5"
        "sensitivity": "0.4"

OC2019OC · February 4, 2020, 2:43pm

If the model you train doesn’t work for other family members (it should be generic enough, though), you can train models for separate phrases for them using their own voices. Someone says “hey pico” for their model and wake word, another says “jarvis” for their model and wake word, another says “supercallifragilisticexpialidocious” for their model and wake word.

You can then filter possible responses based on which wake word was used, as a bonus.

j3mu5 · February 6, 2020, 8:46am

At the moment I am still trying to find a balance for the sensitivity.
Sometimes I have the feeling that Snowboy temporarily doesn’t react (even if you stand directly in front of it and speak loudly), as described in the following topic: Snowboy stops working
As if the wake word temporarily freezes.

I’m currently trying to disable the wake word detection via the api, the documentation says:

/api/listen-for-wake-word
POST “on” to have Rhasspy listen for a wake word
POST “off” to disable wake word

With a rest command I can already trigger text to speech from home assistant or trigger the wake word and make rhasspy listen. But if i follow the documentation it doesn’t work to switch off the listen on the wake word with the following content for the rest command.

url: ‘http://IP.IP.IP.IP:PORT/api/listen-for-wake-word’
method: post
payload: ‘off’

Following error message is shown in the log

[INFO:176060099] quart.serving: 172.18.0.1:52030 POST /api/listen-for-wake-word 1.1 500 21 9226
[ERROR:176060092] main: MethodNotAllowed(405)
Traceback (most recent call last):
File “/usr/local/lib/python3.6/dist-packages/quart/app.py”, line 1471, in full_dispatch_request
result = await self.dispatch_request(request_context)
File “/usr/local/lib/python3.6/dist-packages/quart/app.py”, line 1513, in dispatch_request
raise request_.routing_exception
File “/usr/local/lib/python3.6/dist-packages/quart/ctx.py”, line 45, in match_request
self.request_websocket.url_rule, self.request_websocket.view_args = self.url_adapter.match() # noqa
File “/usr/local/lib/python3.6/dist-packages/quart/routing.py”, line 271, in match
raise MethodNotAllowed(allowed_methods=allowed_methods)
quart.exceptions.MethodNotAllowed: MethodNotAllowed(405)
[INFO:176042561] quart.serving: 172.18.0.1:52012 POST /api/listen-for-wake-word 1.1 500 21 10241

Am I doing something wrong?

synesthesiam · February 7, 2020, 10:29pm

You’re not, sorry about this! That feature will be available in 2.4.18, which I plan to release tonight

j3mu5 · February 8, 2020, 3:17pm

Hello @synesthesiam,

no need to apologize. I have reason to be grateful: Thank you for adding this feature to Rhasspy! Especially my wife will be very happy - as long as the wake word still triggers false positives too often I can now calm her down by temporarily covering Rhasspy’s ears.

ulno · March 11, 2020, 2:32am

We are at 2.4.19 and we still get MethodNotAllowed for using the api/listen-for-wake-word. Can we enable that somehow ourselves?

Btw, I was successful disabling the recording using the follwing:

amixer -D pulse sset Capture nocap # turn recording off
amixer -D pulse sset Capture cap # turn recording on

I use the speak feature with flite and snowboy wakeword detection. It often creates plenty of false recognitions with my own wakewords and therefore then endless loops switching everything in my apartment on off or to different percentages.

synesthesiam · March 12, 2020, 9:18pm

I apparently made a typo: it’s /api/listen-for-wake (without a “-word” on the end)