since i am doing speech recognition with kaldi i am absolutely thrilled how accurate the recognition is compared to pocketsphinx. Well over 95%.
Now there is only one problem with my Rhasspy: the sensitivity of the wake word.
- porcupine is only an option with the freely available words (https://github.com/Picovoice/porcupine/tree/master/resources/keyword_files/raspberrypi), training every 30 days is not an option for me.
- snowboy does not work satisfactorily with a small self-created training data set. Here you have to download widely trained models from https://snowboy.kitt.ai/dashboard.
For both wake word systems i get either too many false positives (when music / video is playing an action is often triggered afterwards) or the sensitivity is so low that the wake word is only rarely recognized. Normally I set the sensitivity high and every time a false positive is triggered, I reduce it a bit.
So now the questions:
- Is there a better approach to the wake word that I haven’t seen before? (e.g. a possibility to import a lot of training data into a wake word system yourself?)
- if there is no other wake word solution at the moment: @synesthesiam is it possible to set a minimum confidence for kaldi similar to pocketsphinx? This could at least reduce the unwanted actions (like “okay, I’ll switch off all devices!”) that follow false positives with background noise.
- @synesthesiam is there also the possibility to switch the wake word detection of rhasspy off and on from home assistant? I tried to set the microphone volume to 0% using the console and alsamixer (which could be done using home assistant) - but rhasspy ignores this setting (no matter if pyaudio or arecord is selected in the rhasspy settings). This would be useful if you want to make rhasspy temporarily deaf during a movie.
ps: as loudspeaker / microphone i use a jabra 710, which is located quite centrally in the room.